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(57) Abstract . 

In a system for compressing video data, the degree of global motion between a plurality of successive frames is determined for use 
in designating and spacing reference frames, relative to the global motion exceeding predetermined thresholds or levels of motion between 
certain ones of the frames. 
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1 

METHOD AND APPARATUS FOR VIDEO DATA COMPRESSION 
USING TEMPORALLY ADAPTIVE MOTION INTERPOLATION 

Field of the Invention 

The field of the present invention relates generally to 
digital data compression, and more particularly relates to the 
coding of successive frames of video data in a selective 
manner providing enhanced data compression. 

Background Of The Invention 

The availability of high speed digital devices and large, 
fast memories has made it possible to give practiced 
expression to an old idea of more efficiently utilizing a 
given bandwidth for a video transmission by only transmitting 
encoded digital signals representing the changes between 
successive frames, or groups of frames. To achieve high 
compression, ohe must resort not only to redundancy reduction 
but also to irrelevancy reduction, coarse coding that exploits 
characteristics of human visual perception. Spatial limits in 
human vision have been exploited extensively in many systems, 
especially in adaptive quantization using the discrete cosine 
transform (e.g., in the DCT quantization matrix), and in other 
techniques such as subband coding, and multiresolution 
representation. Temporal data reduction is based upon the 
recognition that between successive frames of video images 
there is high correlation. However, there has been very 



l"tle worlc on applying te„poraZ characteristics of hu.an 
vxsxcn to i^age coding systems, except in the .ost basic ways 
such as determining a tra-e rate. e.g. 34-60 fra„es/sec3. 
Th.s is partly because of the anticipated higher complexity of 
temporal processing than of spatial processing, and the 
difficulty Of including the temporal dimension in defining a 
standard measure of perceptual guality for video sequences. 

in a standard promulgated in November 1991 by the Motion 
Picture Expert Group, MPEG identified as (iso- 
XEC/.TCVSC3/WG1., , the seguence of raw image data frames are 
divided into successive groups Known as GOP-s (group of 
Pictures,, respectively, and the coded GOP is comprised of 
independent frames X, predicted frames P and bidirectionally 
predicted frames B in such manner that GOP may be comprised, 
for example, as follows: 

I. B, P, B, B, P, B, B, P, B, B, P, B, B. 
The first P frame is derived from the previous 1 frame 
While the remaining P frames are derived from the last 

previous p frame Tho t ^ 

trame. The I and P frames are reference frames. 

Since each B frame is derived from i-u^ -. 

uerivea from the closest reference 

trames on either side of it, the pertinent P frames must be 
derived before the prior B frames can be derived. 

The high definition independent frames 1 at the beginning 
cf each GOP are required because of the use of frame 
differential encoding to avoid accumulated error. The purpose 
°f quantizing is to control the number of bits used in 
representing the changes between the frames. The 
corresponding portions of frames are conveyed by motion 



pixel subsampling in both spatial directions and more 
reduction in calculations can be achieved by using backward 
telescopic searches rather than forward telescopic searches. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Various embodiments of the present invention are 
described below with reference to the drawings, in which 
substantially similar components are identified by the same 
reference designation, wherein: 

Fig. 1 shows equivalent intensity pulses illustrating 
luminance perception by the human eye under Bloch's law. 

Fig. 2A shows a plot of the frequency response function 
of a first order low pass filter model of the temporal 
frequency response of the human eye. 

Fig. 23 is a curve showing the contrast sensitivity 
versus the temporal frequency for a typical temporal 
modulation transfer function. 

Fig. 3 shows a block diagram of a video encoder providing 
temporally adaptive motion interpolation (TAMI) of a group of 
pictures (GOP), for one embodiment of the invention. 

Fig. 4 shows a flowchart for the TAMI programming or 
algorithm associated with the encoder of Fig. 3, for one 
embodiment of the invention. 

Fig. 5 shows plots or curves of signal-to-noise ratio 
(SNR) versus frame number for comparing 0-P processing when 



the number of Pi frames is not limited, versus the signal-to- 
noise ratio obtained when the number of Pi frames is limited, 
in one embodiment of the invention. 

Fig. 6 illustrates a telescopic search for motion 
estimation between successive frames as used in one embodiment 
of the invention. 

Figs. 7A through 7D show default group of picture (GOP) 
structures for N PI frames, for N=o, i, 2, and 3, 
respectively, as used in embodiments of the invention. 

Fig. 8A illustrates a GOP structure for a l-P scheme with 
scene changes of Type l and Type 0, in one embodiment of the 
invention. 

Fig. 8B illustrates a GOP structure for a i-P scheme with 
a scene change of Type l only, and no detection of Type 0 
scene changes. 

Fig. 9 shows a variable bit rate TAMI encoder for another 
embodiment of the invention. 

Fig. 10 shows a flowchart of a variable bit rate TAMI 
algorithm associated with the encoder of Fig. 9. 

Fig. 11 shows a flowchart for providing an optimal 
spacing algorithm for another embodiment of the invention. 

Fig. 12 shows a GOP structure using the 2-P optimal 
spacing algorithm of Fig. ii, when there is a scene change of 
Type 1. 

Fig. 13 illustrates a backward telescopic search for use 
with the optimal spacing algorithms of various embodiments of 
the invention. 



Figs. 14(a) through 14(e) show signal-to-noise ratio 
(SNR) curves for images with little motion when the average 
bit-rate is 736.5Kbit/sec (Tennis), for comparing a 
conventional fixed 4-P scheme, and 0-P, i-P, 2-P, and 3-P 
schemes, respectively. 

Figs. 15(a) through 15(e) are related to the SNR curves 
of Figs. 14(a) through 14(e), respectively, for showing the 
corresponding bit rate per frame, where the average bit rate 
is 736.5Kbit/sec (Tennis). 

Figs. 16(a) through 16(e) show SNR curves derived from a 
plurality of successive frames of a GOP with a scene change 
having an average bit rate of 736 . 5Kbit/sec (Tennis), for 
conventional fixed 4-P, and 0-P, l-P, 2-P, and 3-P schemes, 
respectively, for an embodiment of the invention. 

Figs. 17(a) through 17(e) show curves for the bit rate 
versus successive frame numbers for high temporal activity 
regions with an abrupt scene change, when the average bit rate 
is 736.5Kbit/sec (Tennis), corresponding to Figs. 17(a) 
through 17(e), respectively. 

Figs. 18(a) through 18(e) show SNR curves versus 
successive frame ntunbers for images with little motion when 
the bit rate is 300Kbit/sec (Tennis), for conventional fixed 
4-P, and for inventive embodiments for 0-P, i-p, 2-P, and 3-P 
schemes, respectively. 

Figs. 19(a) through 19(e) show the bit rates versus 
successive frame numbers corresponding to Figs. 18(a) through 
18 (e) , respectively. 



Fig. 20 is a table showing the performance of different 
interpolation schemes for images with little motion activity 
at average bit rates of 736. 5Kbit/ sec, for conventional fixed 
4-P, and inventive 0-P, i-P, 2-P, and 3-P embodiments, 
respectively. 

Fig. 21 is a table for showing the performance of 
different interpolation schemes for images containing a scene 
change at an average bit rate of 736.5Kbit/sec. 

Fig. 22 Shows a table for illustrating the performance of 
different interpolation schemes for images with little motion 
activity at an average bit rate of 300Kbit/sec. 

Fig. 23(a) shows SNR curves for comparing the performance 
Of the FBR-TAMI and VBR-TAMI embodiments of the invention, 
respectively, having the same average bit rate of 663Kbit/sec. 

Fig. 23(b) shows the bit rates per frame for comparing 
curves in using FBR-TAMI, and VBR-TAMI, respectively, with the 
same average bit rate of 663Kbit/sec. 

Figs. 24(a) through 24(e) show the distances between a 
current frame and a first frame for frame numbers 120 through 
180, using DOH, HOD, BH, BV, and MCE measurement methods, ■ 
respectively. 

Figs. 25(a) through 25(e) show SNR curves of an optimal 
spacing algorithm (OSA) of the present invention using DOH, 
HOD, BH, BV, and MCE measurements, respectively. 

Fig. 26 shows a table for tabulating the results of SNR 
and bit-rate results using the OSA embodiment of the invention 
without B2 frames for the five different distance measurement 
inethods, and for three different frame number ranges. 



Fig. 27. shows a table for tabulating SNR and bit-rate 
results using the OSA embodiment of the invention with B2 
frames for the five different distance measures, and for three 
different frame number ranges. 

Figs. 28(a) through 28(e) show curves of SNR versus frame 
numbers 90 through 150 through use of the embodiment of the 
invention of an adaptive optimal spacing algorithm (OSA) with 
B2 frames using distance measurement methods, DOH, HOD, BH, 
BV, and MCE, respectively. 

Fig. 29 shows a composite of curves for comparing* the 
TAMI and oiSA embodiments of the invention. 

Fig. 30 shows distances between one frame and others 
relative to temporal segments using a Type 0 threshold set at 
r . 

Fig. 31 shows an algorithm for another embodiment of the 
invention designated BS E-TAMI (Binary search equidistant 
TAMI) - 

Fig. 3 2 shows a flowchart relative to both TAMI and OSA 
embodiments of the invention. 

Fig. 3 3 shows a flowchart for the steps involved in a 
scene change detection step generally called for in the 
flowchart of Fig. 32. 

Fig. 34 shows another flowchart for a scene change 
detection method for another embodiment of the invention 
designated N-P TAMI, relative to a scene detection step of a 
flowchart of Fig. 32. 



Fig. 35 shows a detailed flowchart of coding steps 
relative to one or more generalized coding steps shown in the 
flowcharts of Figs. 22, 36A, and 36B. 

Figs. 36A and 36B each show portions of the processing 
steps associated with the "MAIN" step of the flowchart of Fig. 
32, in another embodiment of the invention. 

Fig. 37 is a flowchart showing details for the MEP step 
of the flowchart of Fig. 36A. 

Fig. 38 is a flowchart showing the MEI step of the 
flowchart of Fig. 36A. 

Fig. 39 is a block schematic diagram showing a hardware 
configuration for carrying out various embodiments of the 
invention. 

Fig. 40 is a block schematic diagram showing a portion of 
a scene change detector generally shown in Fig. 39. 

Fig. 41 shows a block schematic diagram of a distance 
computation unit generally shown in the diagram of Fig. 40. 

Fig. 42 is block schematic diagram of a Type i scene 
change detector showing generally in Fig. 40. 

Fig. 43 is a block schematic diagram showing a Type 0 
scene change detector shown generally in Fig. 40. 

Fig. 44 shows a block schematic diagram of a GOP 
structure generation unit shown generally in Fig. 40. 

Fig. 45 shows a block schematic diagram of a scene 
detector controller module shown generally in Fig. 40. 

Fig. 46 shows a block schematic diagram of a motion 
compensator module shown generally in Fig. 39. 



Fig. 47 shows a truth table for a switch control block or 
module of the motion compensation shown in Fig. 46. 

Fig. 48 shows a block schematic diagram of a motion 
estimator module for the motion compensator shown in Fig. 46. 

Fig. 49 shows a block schematic diagram of a telescopic 
motion estimator controller shown in the schematic of Fig. .48. 

Fig. 50 shows a block schematic diagram and a table for 
the bit rate controller module of the system shown in Fig. 39. 

Fig. 51 shows a block schematic diagram for a scene 
change detector configuration associated with the encoder of 
Fig. 39, for a BS E-TAMI embodiment of the invention. 

Fig. 52 shows a block schematic diagram of a binary 
search unit associated with the scene change detector of Fig. 
51. 

Fig. 53 shows a block schematic diagram of a generalized 
subband video encoder incorporating TAMI for another 
embodiment of the invention. 

Fig. 54 is a simplified diagram showing a multi- 
resolution motion estimation method associated with the 
encoder of Fig. 53. 

Fig. 55 shows a block scan mode for a differential pulse 
code modulation (DPCM) scheme using horizontal prediction from 
the left, with vertical prediction for the first column. 

Fig. 56 shows a block scan horizontal scan mode relative 
to the subband encoding system of Fig. 53. 

Fig. 57 shows a vertical block scan mode relative to the 
subband encoding system of Fig. 53. 



Fig. 58 shows a table of performance comparisons between 
various embodiments of the invention. 

Fig. 59 shows a block schematic diagram supplementary to 
Fig. 53 for providing greater details of a system for the 
subband video encoding embodiment of the invention. 

Fig. 60 shows a block schematic diagram of the subband 
analysis module of the subband video encoding system of Fig. 



59. 



Detailed Descri ption Of The Pr^^f erred 
Embodimen ts of the Invention 

One of the perceptual factors exploited in one embodiment 
of the present coding scheme is temporal masking, which is 
closely related to but different from temporal summation in 
low level perception. There has been little prior work in 
exploiting temporal masking for image coding. 

Temporal summation has been known for over a century as 
Bloch's law. The law says that below a critical time period 
or duration (T) , about lOOms (milliseconds) , luminance 
perception by the human eye is constant as long as the product 
of time duration (T) and intensity (i) is kept constant, 
namely : 

I XT = k (1) 
This describes a kind of temporal summation (integration) 
occurring in the human visual system. Fig. i illustrates the 
temporal summation effect, in which luminance perception 
depends only on the total area of the pulses i, 2, or 3 
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vectors indicating where blocks in the reference frames may be 
located in the frame being derived. Since differences 
generally arise from motion and there is likely to be more 
motion between P frames than between a P and a B frame, more 
bits are required to derive a P frame from a P frame than in 
deriving B frame from P frames on either side of it. 

In typical MPEG systems, the three frame spacing of 
reference frames utilizing high numbers of bits is required in 
order to adequately convey motion. If, however, there is 
little or no motion, the number of bits devoted to 
representing these frames is excessive. 

Summary Of The Invention 

One object of the invention is to provide an improved 
method and apparatus for video compression. 

Another object of the invention is to provide an improved 
method and apparatus for motion compensated interpolation 
coding for video compression, which is compatible with present 
video coding standards. 

In one embodiment of the invention, temporal segmentation 
is used to dynamically adapt motion interpolation structures 
for video compression. Temporal variation of the input video 
signal is used for adjusting the interval between reference 
frames. The bit rate control for the dynamic group of 
pictures (GOP) structure is based upon the temporal masking in 
human vision. In a preferred embodiment, the spacing between 
reference frames is based upon different temporal distance 



measurements, through use of an a.lgorithm for temporally 
adaptive motion interpolation (TAMI) . These and other 
embodiments of the invention are summarized in greater detail 
in the following paragraphs. 

Instead of rigidly locating reference frames within a GOP 
as in a conventional MPEG system, their location and the 
nuniber of bits used depends in this invention on the amount of 
global motion between frames. Global motion as used in the 
invention, is defined as the motion between frames as a whole, 
and it can be measured in a number of known ways such as the 

difference of histogram, the histogram of difference, block 

histogram difference, block variance difference, and motion 

compensation error. 

In the following description, frames corresponding to I, 

P, and B frames of the MPEG are designated as II, Pi, and Bl 

frames . 

If there is no frame in a GOP where the global motion 
between it and the II or first frame exceeds an empirically 
determined value, Tq, all of the remaining frames are of the B 
Type and are derived from the li frame of that GOP and the II 
of the next GOP. Thus no Pi type frame is used and many bits 
are saved, in. this embodiment of the invention. 

Should the global motion between a frame and a previous 
reference frame exceed To, the previous frame is designated 
herein as a Pi frame. Thus, a Pi frame is used wherever 
necessary to adequately convey the motion. 



The global motion between adjacent frames is also 
measured. if it exceeds an empirically . determined value of T|, 
it is an indication that an abrupt change of scene has 
occurred between the frames. In this case the later frame is 
designated as an 12 frame that is independently processed with 
fewer bits than the II frame, and the immediately previous 
frame is designated as a P2 frame that has fewer bits than a 
PI frame in this embodiment of the invention. The relative 
coarseness of the P2 frame is not observed by the eye because 
of a phenomenon known as backward masking, and the relative 
coarseness of the 12 frame is not observed by the eye because 
of a phenomenon known as forward masking. 

It is apparent in both a conventional MPEG system and the 
system of this invention that reference frames must be 
processed before the B or Bl frames between them can be 
processed. 

The method of operation just described uses bits only as 
required by global motion and is referred to infra as 
temporally adaptive motion interpolation, TAMI. 

When the system is used with a transmission channel such 
as used in digital television, the bit rate may be controlled 
by loading the processed bits into a buffer and controlling 
the number of levels used in the quantizer so as to keep a 
given number of bits in the buffer. In the event that two or ^ 
more successive frames have global motion in excess of Tq so 
that the frames just prior to them are designated as good 
resolution Pi frames, it is possible that controlling the bit 
rate may cause a second PI frame to be processed with fewer 



bits than desired, in such a case, only the first PI frame is 
processed, and the frames between it and the next reference 
frames are processed as Bl frames even though they may qualify 
as PI frames, in another embodiment of the invention. 

Another way of controlling the quantizer so as not to 
exceed a fixed bit rate is to look at the total number of bits 
called for in a GOP if the nominal numbers of bits are used 
for processing the designated frames, and if it calls for a 
bit rate in excess of the fixed bit rate, the nominal numbers 
of bits are lowered proportionately as required. Thus, if 
there are too many Pi frames, the quantized levels are reduced 
so that fewer bits are used in processing all of them, in 
another embodiment of the invention. 

If the coding system of this invention is coupled to a 
distribution system, such as one using the asynchronous 
transfer mode (ATM) , concept of a broadband integrated 
services digital network (ISDN), in another embodiment 
variable bit rate (VBR) coding can be used with TAMI to form a 
VBR-TAMI system because of the very wide effective bandwidth 
of such a channel. This system is different from TAMI only in 
the fact that the number of Pi frames is not limited. 

In fixed bit rate TAMI (FBR-TAMI) , there is as in any 
block motion compensation coding system, a tendency for 
reference frames to be too far apart e.q. when there is no 
global motion in excess of To, so as to produce perceptually 
displeasing coding errors at moving edges. 
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Fiirthermore, the longest encoding delay in FBR-TAMI or 
VBR-TAMI is equal to the duration of a GOP, which in the case 
of a GOP of fifteen frames is one half a second, and thus may 
be too long. 

In order to alleviate these problems, N-Pl reference 
frames are inserted into the GOP structure by default i.e. 
they are not called for by the global motion involved, in yet 
another embodiment of the invention. This divides the GOP 
into N+l segments that may be processed in sequences so as to 
reduce the time between reference frames and provide better 
motion representation as well as to reduce the processing 
time. As N increases the coding delay is reduced but so are 
the bit rate savings. 

If no frame in a GOP is designated as a Pi frame because 
global motion from 11 did not exceed Tq, a Pi frame is replaced 
by a Bl frame so that |Pl|-|Bl| extra bits are available for 
processing a B type frame. (The absolute value notation is 
used to denote the number of bits allocated to a frame.) A B 
Type frame with the extra bits is called a B2 frame, in this 
embodiment of the invention. The relation between |Bl| and 
|B2| is given by the following expression: 

\B2\ = - l^^l - l^^l 

' ' ' ' M - N - 1- 



Where M is the GOP size and N is the number of injected Pi 
frames . 



After the frames of a GOP have been designated as PI, and 
Bl or B2 frames in any system or embodiment of the invention, 
the calculations required for the interpolations of the Bl, or 
B2 frames may be accomplished by the usual motion compensation 
encoder, but it is preferable to use an encoder that uses 
telescopic motion vector searching as well as a different bit 
rate control system. 

In a preferred embodiment of the invention Pi frames are 
located so as to have as close to the same amount of motion 
between them as possible rather than being located at a frame 
having a motion with respect to the previous reference frame 
that exceeds Tq. Differences in motion for each frame are 
generated, and as before, 12 and P2 frames are designated at a 
scene change when the motion between them exceeds T,. A number 
N of PI frames to be used is assumed, and calculations are 
made of the temporal distances between them and the reference 
frames on either side for all combinations of N positions, and 
the positions closest to the mean of all these measurements 
are selected. This embodiment of the invention is designated 
as OSA, for optimal spacing algorithm. 

In another embodiment of the invention the most 
advantageous number, N, of P frames is determined on a dynamic 
basis from the frame designations in each GOP. 

Another embodiment of the invention applies TAMI to 
subband video coding. In view of the fact that highly accurate 
motion vector information is not required in carrying out the 
algorithms associated with the various embodiments of the 
invention, the number of computations can be reduced by using 



regardless of the durations and the intensities of individual 
pulses. It is known that temporal sununation is a neural 
phenomenon and that it does not occur at the photoreceptor 
level, but researchers still do not know whether it occurs at 
the ganglion cell level, the lateral geniculate nucleus (LGN) , 
or the cortex. The critical duration is about 100ms when a 
spot of light is seen against a black background (scotopic 
conditions) . Under natural viewing conditions (photopic 
conditions) , the critical duration is much shorter, on the 
order of 20ms. For a task involving perception of image 
content, such as being able to tell object orientation in a 
test image, the critical duration can be as long as several 
hundred milliseconds. Because of this summation process, 
human vision has limited temporal resolution, and the critical 
duration is generally not less than 20 to 100ms. This is the 
main psychophysical factor that influenced the choices of 
frame rates for movies and television. 

The present invention is concerned with the temporal 
masking aspect of vision. A simple low pass filter model is 
used to characterize the phenomenon. It is adequate to model 
human temporal processing as a leaky integrator, i.e., a 
first-order low pass filter. 

The temporal transfer function expressed as a Laplace 
transform, can be modeled by: 



- H(s) = 



1 + sT 



(2) 



where T is recovery time (critical duration) , which is about 
100 to 200ms. The frequency response, expressed in terms of a 
Fourier transform, is given by: 

This response roughly reflects the temporal modulation 
transfer function (MTF) , which is defined as the reciprocal of 
the just visible sine wave modulation amplitude, it is a 
sensitivity response function of the eye with respect to 
temporal frequency of the stimulus. Fig. 2A shows a frequency 
response curve 27 of the leaky integrator model, and Fig. 2B 
the frequency response curves of a typical temporal MTF, with 
curves 31 and 33 having spatial frequencies of 0.5 
cycles/degree and 16 cycles /degree, respectively. 

There are two kinds of temporal masking, forward and 
backward. It is forward masking when arriving stimulus acts 
forward in time to effect one which comes later, and backward 
masking when the stimulus arriving later effects one which has 
already come and gone. Because of these effects in the coding 
scheme the immediate past frame at a scene change can be 
coarsely coded, as can the following frame. This effect was 
verified in an experiment, in which any difference of 
perceptual quality between the original frames and the frames 
with a coarsely coded immediate past frame was detected. 



Little perceptual difference was detected even when the frame 
is coded with as few as 20% of the number of bits of a regular 
frame. 

In general, these forward and backward masking effects 
can be explained by two underlying processes in temporal 
masking. One is masking by light, which occurs when there is 
a drastic change in luminance. This effect is believed to be 
caused by lateral inhibition in the retina. The other is 
masking by visual processing noise, which occurs when there is 
an abrupt change in visual patterns. This can be explained by 
the degradation of spatial contrast by temporal summation. 
The combined effect of these two masking processes produces 
the forward and backward masking effects when there is a scene 
change . 

Temporally Adap tive Motion Interpolation fTAMI) Algorithm 
Fixed Bit Rate Coding ( FBR-TAMI ) : 

The present new motion interpolation coding techniques 
adopt some terminology from the MPEG standard, including I 
(intra frame) , P (predicted frame) , and B (bidirectional 
interpolated frame) , and are generally compatible with 
apparatus following present video coding standards. In one 
embodiment of the invention, a temporally adaptive motion 
interpolation algorithm (hereinafter referred to as TAMI 
algorithm) was developed. One variation of this algorithm 
uses fixed bit rate coding, providing a FBR-TAMI, which is 
discussed below. 



In the TAMI algorithm, the interval between two reference 
frames is adapted according to the temporal variation of the 
input video, and the bit rate control takes advantage of 
temporal masking in human vision. The crucial issue in this 
approach is the bit rate control problem because the group of 
pictures (GOP) structure is dynamically changing. When a 
scene change occurs, it is desirable to code the first frame 
after the scene change as an I or intra frame, which might be 
impractical because the bit rate would increase drastically if 
there were many such scene changes in one GOP. 

This problem can be resolved by coarsely quantizing the 
new I or intra frame with the same number of bits as used for 
a regular B frame. This does not degrade the picture quality 
when the sequence is continuously displayed because of the 
forward temporal masking effect. It is known that if the bit 
rate (bandwidth) of frames following a scene change is 
gradually increased back to full bit rate (bandwidth) , then 
the degradation of the frames following a scene change is not 
perceptible. 

Using a poor quality intra frame after a scene change 
directly affects the picture quality of the following frames 
until a new intra frame is used, with the quality of the 
following frames becoming better over successive frames. This 
gradual improvement in quality is thereby achieved without a 
complex scheme for explicitly controlling bit allocation on a 
frame-by-frame basis. 



To detect significant temporal variations of the input 
video, different temporal distance measures were considered in 
developing the present invention. These distances are 
actually a measure of the amount of global motion between 
frames. These motion measures can be determined by the 
difference of histogram (DOH) , histogram of difference (HOD) , 
block histogram (BH) , bldck variance (BV) , and motion 
compensation error (MCE), respectively. They are described in 
detail below. In the present TAMI algorithm, six different 
frame Types, li, 12, Pi, P2, Bl, and B2 are used. Frame 
Types II, PI, and Bl are the same regular frame types as 
defined in the MPEG standard. Frame Types 12 and P2 have the 
same bit allocation as Bl frames; thus 12 and P2 are very 
coarsely quantized intra and predicted frames, respectively. 
On the other hand B2 is an interpolation frame with more bits 
than a regular Bl frame, and generally fewer bits than a PI 
frame. An Ii designated frame is a full frame, and is finely 
quantized. 

In one embodiment for TAMI, an II frame is always the 
first frame designation for each GOP. When the cumulative 
motion or measured distance from an immediately preceding 
reference frame and a successive frame in a GOP exceeds a Type 

0 threshold, the immediately prior frame to the successive 
frame is designated as a Pi frame, when the motion, or 
measured distance between two successive frames exceeds a Type 

1 threshold, the first or immediately prior frame is 
designated as a P2 frame, and the second or immediately past 
frame as an 12 frame. Il, 12, Pi, and P2 frames are reference 



frames used in interpolating Bl and B2 frames. In a GOP where 
Type 0 scene changes occur, Bl frames are designated between 
reference frames. In a GOP where no Type 0 scene changes 
occur, B2 frames are designated between reference frames, for 
example, as described below in greater detail. In other 
words, B2 frames are used when no Type 0 scene changes are 
detected in a GOP, whereby no PI frames other than possible 
default PI frames result. Accordingly, the bits saved by not 
requiring additional PI frames may be distributed among Bl 
frames to cause them to become B2 frames with greater 
resolution. Accordingly, the B2 frames are basically Bl 
frames with slightly higher bit allocation. 

In another embodiment of the invention, in a GOP a first 
occurring P frame is predicted from the immediately previous I 
frame. A subsequently occurring P frame is predicted from the 
immediately previous P frame or 12 frame, whichever is 
closest. 

Fig. 3 shows the block diagram 10 for the TAMI algorithm. 
The TAMI algorithm first looks at all the frames in the 
current GOP, detects scene changes of Type l (using a Type l 
scene change detector or SCD 12), and detects scene changes of 
Type 0 (using a Type 0 scene change detector or SCD 14) . The 
next steps determine the positions of P and B frame in the GOP 
or group of pictures structure (using a GS output detector 
16). Then using the positions of P and B frames, the frames 
are processed by a motion compensated interpolation encoder 
18, which is similar to a typical motion compensation encoder 
except that it uses telescopic motion vector search and a 



different bit rate control mechanism, in this example. As 
shown, encoder 18 includes a discrete cosine transform (DCT) 
module 4, a quantizer (Q) 5, a variable length code generator 
6, a buffer 7, an inverse quantizer (Q') 8, an inverse 
discrete cosine transform module (IDCT) 9, a motion estimator 
(ME) 15, and first and second summing junctions 17 and 19. 
Note that DCT 4 converts data from the time domain into the 
frequency domain, as is known in the art. 

The flowchart 10 of the TAMI algorithm is illustrated in 
Fig. 4, including details of the programming for scene change 
detectors 12 and 14. The algorithm 10 includes steps 100 
through 110, and 16, for programming a microprocessor, for 
example, to take the following generalized steps, using one 
GOP as a processing unit: 

A. It loads frames from one GOP into an associated 
frame memory included in the microprocessor (steps 
101 and 102) . 

B. It detects the positions of scene change of Type 1 
via the Type 1 SCO 12 (steps 103, 106, and 107). 

C. It detects the positions of scene change of Type 0 
via the Type 0 SOD 14 (steps 104, 108, and 109). 

D. It determines the GOP structure, i.e., the positions 
of all I, p, B frames via GS 16 (steps 105 and 16). 

E. It outputs the information generated in step "D" to 
the motion compensating encoder 18. 

Two types of scene detectors 12 and i4 are required 
for processing the algorithm, as shown. In Fig. 4, the first 
detector 12 declares a scene change of Type i for the current 



rrame when the distance or relative moven.ent measure between 
the current frame f. ^nd the immediate past frame f,.. is above 
a threshold T. (step 103). This type of scene change 
corresponds to an actual scene content change; it is coded as 
an 12 frame (very coarsely quantized intra frame) , and the 
immediate past frame f... is coded in step 106 as a P2 frame 
(very coarsely quantized predicted frame) . The 12 frame 
coding exploits the forward temporal masking effect, and the 
P2 frame coding takes advantage of the backward temporal 
masking effect. 

The second detector 14 detects scene changes of Type 0. 
This implements a temporal segmentation algorithm for 
processing. This algorithm, as shown in Fig. 4, declares the 
current frame f. as a scene change of Type o, when the distance 
or relative movement measured between- the current frame f, and 
the last reference frame f„, is above a threshold T„ (see step 
104) . This time the immediate past frame f,, becomes a Pi 
frame which is a regular predicted frame. The bit allocation 
strategy for the temporal segmentation is that every end frame 
Of temporal segments should become a Pi frame, and that the 
frames in between should be Bl or B2 frames depending on 
Whether the extra Pi frame is being used or not. 

As a result of experimentation, one modification was 
-de. The number of extra Pi frames was set to one because 
»ore than two extra Pi frames was found to cause quality 
degradation, while zero extra Pi frames means no 
adaptability. The reason for the degradation when using many 
PI frames is explained as follows, since fixed bit rate 



coding with a feedback loop 21 between the Buffer 7 and the 
quantizer (Q) 5 is being used (see Fig. 3), the bit rate for 
successive Pi frames is gradually reduced (i.e., coarsely 
quantized) by the feedback loop 21. However, use of the 
coarse Pi frame produces adverse effects. It degrades not 
only the quality of the Pi frames but also the B frames in 
between. It is preferred- to limit the number of coarsely 
quantized Pi frames in high-motion segments, whereby the Pi 
frames eliminated are replaced by Bl frames. The degradation 
effect with many or unlimited PI frames can be easily seen in 
Fig. 5, in comparison with the limited Pi scheme, relative to 
curves 23 and 25, respectively. 

Two different bit allocation schemes may be used. The 
first scheme is a constant bit allocation scheme where the bit 
allocation for each picture type except B is always constant 
from GOP to GOP regardless of the number of detected SSPs 
(scene segmentation points, i.e. scene changes of Type 0). 
The constant bit allocation scheme is more suitable for some 
applications for two reasons. First, the picture quality of I 
and P frames varies more than the quality of a B frame does 
when a variable bit allocation scheme is used. Second, since 
an I frame is repeated every 15 frames (every 1/2 sec), a 
problem of 2 Hz blinking may be encountered if the I frame bit 
allocation is allowed to vary. The second scheme is a 
variable bit allocation scheme where the bit allocation for 
each picture type is allowed to change from GOP to GOP 
according to the number of detected SSPs, but the ratio of bit 
allocations between different picture types is always 



constant, when the constant bit allocation scheme is used, 
one cannot afford high variability of the number of P frames 
because of the constrained bit budget (fixed bit rate coding 
or constant bit rate coding) . Hence, to have some 
adaptability, in one embodiment the variation of the number of 
P frames is limited to 1, which can easily be implemented by 
using two different bit allocations for B frames (Bl and B2 
frames), i.e., Bl frames are used if an extra P frame is used 
and B2 frames are used if not. The variable bit allocation 
scheme is used in the adaptation of the number of P frames, as 
described later, to allow a variable number of P frames in a 
GOP. The following paragraphs describe constant bit rate 
allocations in greater detail. 

Fixed Bit Rate Coding (FBR TAMI) : 

The present TAMI algorithm uses a simple rate control 
strategy which is based upon regulation of the number of bits 
inputted to a buffer 7 (see Fig. 3), whereby if during coding 
of one frame the buffer 7 begins to fill up, the system is 
programmed to automatically reduce the number of bits 
generated in coding the next frame. After the bits are 
allocated via GS output 16 for the different picture Types 
(II, 12, PI, P2, Bl, B2), the target bits for a slice are 
computed via encoder 18. Note that a slice is defined in the 
MPEG standard as a series of macroblocks (one of the layers of 
the coding syntax) . Also, a macroblock is defined in the MPEG 
as "the four 8-by-8 blocks of luminance data and the two 



corresponding 8-by-8 blocks of chrominance data coining from a 
16-by-i6 section of the luminance component of the picture." 
At the end of each slice, the buffer 7 content is monitored. 
If it is above or below the target buffer content, the QP 
(Quantization Parameter) is reduced or increased respectively 
(slice level bit rate control) . To be adaptive to the changing 
coding difficulties of an incoming video sequence, the bits 
produced in the current picture are used as the target bits 
for the next occurrence of a frame of the same picture type. 
When the bits produced for the previous frame are more or less 
than the target bits, the bit allocation for the next picture 
is adjusted to maintain an acceptable range of bit rate per 
second by the equation (frame level rate control): 



TB = XTB 



TBR. 



GOP 



ABR 



GOP 



(4) 



Where TB is target bit allocation, ABR^op is actual GOP bit 
rate, and TBRcop is target GOP bit rate, and XTB is the target 
bit allocation for the previous frame. 

There is a difficulty with the motion estimation because 
the frame interval between the current and the reference frame 
can be as large as 15 (for GOP size 15) , which means that the 
search region for a motion vector can be as large as 105, 
assuming the search region for adjacent frames is 7. Using 
Mean Absolute Difference (MAD) as the matching criterion for 
the full search, this would require about 2.5 X lo" operations 
per second for a sequence in the GIF format(352 by 240 pixels. 



30 frames/ sec) • To reduce this computational complexity, a 
telescopic search as mentioned in the MPEG standard is used 
because it is believed to be the most suitable for the long 
interval between reference frames, and provides very good 
motion compensating performance in computer simulations. An 
example to search motion vectors between four frames is given 
in Fig. 6. The basic idea of the telescopic search is that 
the center position of the search window for a block in the 
current frame is shifted by the motion vector obtained for a 
corresponding macroblocJc in the previous frame, repeating 
until the last frame is reached. Accordingly, as shown, 
vector 22 shifts the center of the search window from the 
center of block 20 to the center of block 24 in Frame 1; 
vector 26 shifts the center of the search window from the 
center of block 24 to the center of block 28 in Frame 2; and 
vector 30 shifts the center of the search window from the 
center of block 28 to the center of block 32 in Frame 3. 

N-P TAMI algorithm: 

There are two problems in the TAMI algorithm. One is its 
tendency to produce longer intervals between reference frames. 
This promotes perceptually displeasing coding errors around 
moving edges, which is well known to be typically associated 
with any block motion compensation coding. This error occurs 
because a macroblock may consist of two objects (usually, one 
is a moving object and the other is background) which are 
moving in two different directions. Since each block is 



motion compensated by a single motion vector, this produces 
residuals having mainly high frequency components that are 
lost in the quantization stage because of coarser quantization 
for high frequency DCT coefficients. The other problem in the 
algorithm is that longer delay is inevitable because all the 
frames in a GOP have to be stored before the TAMI algorithm 
runs. The longest encoding delay in this case is one GOP 
size, which is usually fifteen frames, correspondingly to 1/2 
second when the frame rate is 30 frames/sec. 

To alleviate these problems, generalizations of the TAMI 
algorithm or programming steps were developed. To reduce the 
encoding delay and distances between reference frames, N PI 
frames are inserted into GOP structure by default. The 
modified program is herein referred to as the N-P TAMI 
programming steps or algorithm. Note "N" is the number of 
default P frames. The modified program allows for the choice 
of N from 0 to 3, and produces four different schemes, namely, 
0-P scheme (IBBBBBBBBBBBBBB) , 1-P scheme (IBBBBBBBPBBBBBBB) , 
2-P scheme (IBBBBPBBBBPBBBB) , and 3-P scheme 

(IBBBPBBBPBBBPBBB) . For even N, the GOP size must be fifteen 
frames to have even spacing between reference frames. For odd 
N, the GOP size should be sixteen frames for the same reason. 
Figs. 7A through 7D show the default GOP structures for N = 
0,1,2, and 3, respectively. Note that for N=4, this is 
equivalent to the conventional implementation of the MPEG 
standard. Fig. 8A also shows an example of the GOP structure 
generated by the present l-P TAMI algorithm, when there is a 
scene change of Type 1, and at least one scene change of Type 



0. Note that as described above, Bl frames are designated 
between any pair of reference frames, i.e., I,, 12, Pi, and/or 
P2, in this example, if no Type 0 scene changes are detected, 
as shown in Fig. 8B, for example, B2 frames are designated 
between reference frames, in that at least one less Pi frame 
is designated relative to the example of Fig. 8A, permitting 
more bits to be used ..for the B frames. As N increases, 
smaller encoding delay and smaller inter-reference frame 
intervals are encountered, but bit rate savings are reduced. 

Assume that bit allocations (Kbit/frame) are |ll| = 
180.0, jl2l = 6.75, |P1| = 100.5, |P2| = 6.75, |Bl| = 6.75. 
The relationship between B2 and Bl is as follows: 

\B2\ = \B1\ + l^^l - l^^l ,5, 



where N is the number of Pi frames used, and M is the GOP 
size. 

The bit rates per second, BR, for the four schemes are 
derived, for example, from the allocations via the following 
computations : 

I. Conventional fixed 4-P GOP structures 
(IBBPBBPBBPBBPBB) : BR = 1299 Kbit/sec 
• II. 0-P scheme: 

BR = 736.5 Kbit/sec, 56.7% of the fixed scheme. 
III. l-P scheme: 

BR =878.9 Kbit/sec, 67.7% of the fixed scheme. 



• IV. 2-P scheme: 

BR = 1111.5 Kbit/sec, 88.5% of the fixed scheme. 

• V 3-P scheme: 

BR = 1230 Kbit/sec, 94.7% of the fixed scheme. 

Variable Bit Rate Coding (VBR-TAMI) : 

Spurred by the recent advancement of the ATM 
(Asynchronous Transfer Mode) concept of B-ISDN (Broadband 
Channel Integrated Services Digital Network) technology, 
variable bit rate coding (packet video) is becoming a very 
promising scheme for network-oriented video transmission. 
This scheme relaxes the bit rate control restrictions on the 
encoder and enables constant picture quality transmission 
instead of constant bit rate. 

In the above-described fixed bit rate coding embodiment 
for FBR-TAMI, the number of Pi frames is limited because of 
the fixed output bit rate constraint. As a result, the output 
bit rate is maintained at the cost of degradation of picture 
quality in intervals where there is high motion activity. If 
the restrictions on the number of Pi frames and the feedback 
loop for bit rate control are removed , the TAMI algorithm 
becomes a VBR (Variable Bit Rate) encoder and produces 
constant picture quality (perceptual picture quality, not in 
terms of constant SNR) by inserting more PI frames into 
temporally busy regions. Hence, the VBR-TAMI encoder will 
compress video data much more than a conventional fixed GOP 
structure encoder for the FBR-TAMI encoder. 



Fig. 9 shows the block diagram of one embodiment of the 
present VBR-TAMI encoder 34. Compared to the FBR-TAMI encoder 
18 of Fig. 3, the VBR-TAMI encoder 34 does not include a 
buffer 7, or a rate control feedback 21 for maintaining a 
fixed bit rate. Instead, the network 36 acts as a buffer, and 
a network estimator 35 is included in a network feedback loop 
37 between network 36, and quantizer 5 and variable length 
coder 6. 

The flowchart for the VBR-TAMI algorithm 38 is shown in 
Fig. 10. The numpl statement in step 104 of the FBR-TAMI 
algorithm 10 of Fig. 4, for limiting the number of Pi frames, 
is not required in step 104' of the VBR-TAMI algorithm 38 of 
Fig. 10. This is the only difference between algorithms 10 
and 38. The use of B2 frames in the VBR-TAMI algorithm 38 is 
meaningless because use of any number of Pi frames is now 
allowed. 

In the VBR-TAMI algorithm 38, when there is a scene 
change of Type 1, temporal masking is also applied to the two 
frames at the scene change (i.e., the preceding frame is a P2 
frame and the following one is an 12 frame) . 

The following is a simple bit rate performance analysis 
for the VBR-TAMI encoder 34. For simplicity assume that the 
PI event (i.e., declaration of scene change Type 0) is a 
Bernoulli random variable. Then the number of Pi frames in a 
GOP is a random variable K with a binomial distribution, as 
shown by the following equation;. 



PIK = k] = P„., (k) = ( jp^d-p)-*^-! 



(6) 



Where M is the GOP size, k is the number of Pi events, and p 
is the probability of having a Pi event at a frame. The mean 
Of this distribution is k = e[K] = (M - i)p. (m - i) is the 
number of possible positions for Pi because the first frame of 
a GOP always has to be an li frame. Although it is not 
exactly correct because of the exclusion of the first frame in 
a GOP, the interarrival distribution of Pi arrivals can be 
modeled by a geometric distribution as follows: 

P(T) = p(i-p) (r-i) 

where T = 0,1,2, ... is the interarrival time between 
successive Pi frames. The mean of this distribution is given 
by E[T] = i/p, and p = k(M - i) from the binomial 
distribution. From this one obtains k = (M - i) e[T] . 

The mean and variance of the output bit rate with k as a 
parameter can now easily be computed, to obtain a rough 
measure for motion activity in the input video. For example, 
assume there are kPl-frame events in a GOP. Then one will 
have one II frame, kPl frames, and (M-l-k) Bl frames, and the 
bit rates are computed as follows: 



(8) 



R(k) = -22 je^^u) 



where Rcopwis the GOP bit rate, Ra, R,,, and R„ are bit 
allocations for II, Pi, and Bl frames, and R(k) is the bit 
rate per second. Then the mean and variance of the bit rate 
are given by the following equations: 

w - 1 



(11) 



Where R<, = r., + (m - 1) R which is the bit rate when there 
is no PI frame. This shows that the mean bit rate is linearly 
proportional to the expected number of arrivals, and that the 
variance is at its maximum when the arrivals are the most 
uncertain (i.e., p « 1/2). 

As an example, when the M = 15, k = 2, Ry = I8O, 
Rpi = 100.5, and Rti= 6.75, provides E[R] = 924Kbit/sec 



and o^CR] = 2 4 5. 5Kbit /sec. Similarly for the 1-P scheme, the 
following equations apply: 



i?cop(Jc) = + (ic +J)Rpj^ * (M -1- 1 - k)R^, (12) 



^^*) = ^Raop(k) (13) 



The following expressions for mean E[R], and variance a^, 
also apply: 



EIR] = ^ ({J?p, - i?^,)7c+R^(i)) (14) 



''''' - V,Mt'lX ^ iM-l-^-l^, (15) 



where R„(l) = Rj,+lRp,+ (M -1-1) R,,, which is the bit rate when 
there are 1 pi frames. 

Distance Measures for Temporal Segmentation: 

Five different distance measures for temporal 
segmentation will now be considered, as an example. First, 
notation must be defined. The number of pixels of an image is 



denoted by n^;^ the width by W; the height by H; the number of 
luminance levels by q; and the frame number index by n. Then 
an image sequence is defined by 



^ = {fnif. Lx X -* F,n = 0,1,2,- • • } (16) 
where L, = {0,1,-.., w-l}, L, = {0,1, • ♦ • , H - 1} , and F = 
{0,1,''- , (q - 1)}. The corresponding histogram sequence is 
defined by: 

B = {h,|h. : F - Z*,n = 0,l,.--,(q - 1)} (17) 
where is a set of all nonnegative integers. The histogram 
operator H from an image to a histogram is defined as: 



K = H-ffl (18) 



where H :F -* H. 



1) Difference of histocnrams rnnw) ♦ The distance 
measure between f„ and f„ is defined by li norm of their 
histogram difference as follows: 



Q-1 

^(^n'fJ =IK-i2j|, = .hji)\ (19) 

i = 0 



Researchers have reported that the luminance histogram i 
a very efficient index for image content. The histogram 
difference between two pictures can thus be a good measure of 
the correlation of the content between them. Another 



important advantage of using DOH distance measxire is its 
insensitivity to local motion activities, regardless of the 
speed of the motion (e.g., an object moving in a stationary 
backgroxind) , compared to its sensitivity to global motion 
activities such as zooming, panning and scene changes, since a 
good temporal segmentation should effectively detect global 
changes and not be too sensitive to local motion activity that 
can be compensated for by a typical motion estimation 
algorithm. 

The DOH is better for detecting global changes rather 
than for detecting local motion. 

2) Histogram of difference image (HOD) : The histogram 
of differences between two images is denoted by: 

HOD{^) =H(f^ - 4) (20) 



Where HOD is a function defined as hod: 

{-(q-l) ,-(q-2)- ,-1,0,1,- ,q - 1} ^ z ^ . 

Note that this is essentially the same quantity as the 
summation of the entries of the co-occurrence matrix along 
lines parallel to the diagonal. If there are more pixels far 
from the origin of HOD, it means that there are more changes 
in the image. The movement criterion can be roughly defined 
by the ratio of the counts at nonzero positions to the total 
number of counts in HOD. Hence the distance measure is 
defined as follows: 



g,a] 



hodU) 



EQ-1 hod(i) 
i=-g+l 



(21) 



where a is a threshold for determining the closeness of the 
position to zero. This HOD measiire has somewhat different 
characteristics than DOH. HOD is much more sensitive to local 
motion than DOH. 

3) Block hi stogram difference rBH> : In HOD, a problem 
is that local motion information is lost by computing a global 
histogram. This problem can be reduced by computing 
histograms of each block, and summing the absolute difference 
of each block histogram between 2 frames. Let the total 
number of macroblocks per frame be denoted by mbnum. For a 
given b^, macroblock of frame f„, the block histogram is defined 
as follows: 

h„(b,-) = Hbf„ (22) 

where H^ is the histogram generator for the b^, macroblock, and 
be[0,i, • • • ,]nbnum - 1], The distance measure is defined by: 

D{f,.fJ =ES)|h,(i,i) -hjb;i)\ (23) 

where be [ 0, 1, (mbnum - i) ] is the index number for a 
macroblock and ie[0,l, • * • , (q - i) ] . 

^) Block variance diff erence rRV) ♦ The idea of using 
this measure is the same as for the block histogram difference 
except that the variance is used instead of the histogram. 



The distance using the sum is defined by the sxm of absolute 
difference of the block variance between two frames, which is 
given by: 

Difn^fJ =5|var^(i?) -var^(i>)| (24) 



where be [0, 1, • - , (mbnum -1)]. Like the block histogram 
difference, this approach is made sensitive to local motion 
activities by computing the differences block by block 

5) Motion compensation error fMCE) : Suppose frame f^, is 
predicted from f„ by motion estimation. Since coding 
difficulty is directly determined by the error between f and 

xn 

f„ which is a prediction from f„. This motion compensation 
error can provide a measxire for the coding of the error image 
between f„ and f„. Hence the distance ..measure using this error 
is defined by the following equation: 



Since this measure is computed directly from the 
prediction error, it is the nearly ideal measure for the 
coding difficulty of prediction error. However, the best 
measure would be the number of bits generated by this image 
coder, but this is a unrealizable because it would require the 
encoding results in the preprocessing stage. This approach 
using motion compensation error is near-optimal but the 
drawback is that it is computationally too expensive. 



Optimal Spacing Algorithm : 



In another embodiment of the invention, the basic TAMI 
algorithm was improved for the detection of scene change Type 
0. It was recognized that since a fixed GOP size is being 
used, the basic TAMI algorithm may not produce the best 
possible spacing between reference frames. A description 
follows for the modification developed to improve the spacing, 
by providing an optimal spacing algorithm, in a preferred 
embodiment. As will shown, in using the optimal spacing 
algorithm (OSA) , Type 0 scene change detectors are not used; 
only Type l scene changes are detected. 

The flowchart of the OSA algorithm 60 for a 2-P scheme is 
given in Fig. ii. The algorithm 60 takes the following steps: 

1. In step 61, GOP frames are loaded into memory, 
and also difference measures for each frame in a GOP 
are generated within step 61. 

2. Using a distance measure between two adjacent 
frames, scene changes of Type i are detected in step 
62. The frame just before the scene change is 
declared as a P2 frame, and the frame just after the 
scene change as an 12 frame (i.e, temporal masking 
is also being used in this scheme) . Note that Figs. 
4 and 10 show the steps 106 and 107 for 
accomplishing the determination of P2 and 12 frames 
via a Type l scene change detector. The P frames 
corresponding to these points are not included in 
the total number of P frames (i.e. Pi frames). 



3. An exhaustive search (steps 63-69) is used to 
find the best positions of PI frames for 
equidistance, i.e., minimizing the deviation from 
the mean of the distances between the positions that 
include candidate Pi frames that would have been 
designated by Type 0 detection and the points of 
scene change Type 1, from which the GOP structure is 
determined in step 70. 

The deviation from the mean in step "3" above can be 
defined using the following notations. Suppose that the GOP 
is partitioned into s segments where each segment consists of 
two reference frames. Define the first and the last frame 
numbers of the ith segment by fpn(i) and Ipn(i) . The 
distance for the ith segment can be expressed as: 

= D(ffp„(.v f»pn(0) (26) 

where D is the distance measure. Then the deviation, dev is: 

s 

dev ^ J2 Mi " ^1 (27) 
i=l 

where 

d = -JE (28) 
i=l 

Fig. 12 shows an example of the GOP structure using 2-P 
optimal spacing algorithm when there is a scene change 
of Type 1. 



Assume that the GOP size is equal to M; the number of Pi 
frames used is N; the number of scene changes of Type 1 is u, 
and the number of adjacent pairs of scene changes of Type i is 
V. Then the number of searches S is as follows: 



since it searches through all combinations of N frames of Pi 
type. The number of positions searched is (M - 2u+v - i) 
because the first frame is always coded as an I frame and the 
two neighboring frames for each scene change of Type 1 are 
excluded. However, if there are v pairs of scene changes 
there are v common neighboring frames, so the total number of 
exclusions due to scene changes becomes (2u - v) . since N is 
a fixed number, less than five, this is a polynomial-time 
search algorithm. The GOP size M is usually less than 20 and 
(2U-V) is always positive, whereby in such instances even an 
exhaustive search is not computationally too expensive (S is 
roughly on the order of 100 ) . if there are several optimal 
answers, a solution is chosen where the intervals (frame 
number difference) between reference frames are the most 
evenly distributed, i.e., a solution where the deviation from 
the mean of interval sizes is the smallest. 

The OSA algorithm (Optimal Spacing Algorithm) can in 
another embodiment also be improved further in its adaptivity 
by using one extra reserved Pi frame as in the TAMI algorithm. 
Here B2 frames are used depending on the local motion 
activity. If the average distance between reference frames is 



above a threshold, one more Pi frame is assigned by default 
(i.e., it becomes an ((N+l)-p) scheme using Bl frames). 
Otherwise the B2 frames are used (an N-P scheme using 32 
frames) . Further improvement may be obtained by adapting the 
number of Pi frames to a criterion of local motion activity. 

In steps 61-66 (Fig. 11) of this OSA algorithm 60 the 
computation is not overly expensive when the distance measure 
uses histogram or variance approaches, i.e., the order of the 
number of operations per frame is 0(Npi,) x lo^ where N^;, is the 
number of pixels, resulting in about S x lo' « lo* operations 
for an OSA algorithm with 352 X 240 image size, where S is the 
nximber of searches given in equation 29. However, if a 
motion compensation error approach is used (see Equation 25 
infra) , the complexity becomes about S x lo" a lo", assuming S 
is 10^. Hence a fast motion vector seatch algorithm has to be 
used to make the OSA algorithm practical. 

Highly accurate motion vector information for 
segmentation is not required, therefore the computation can be 
reduced by a factor of 2 in both dimensions ^ia pixel 
subsampling , i.e., about 1/4 computation saving is achieved. 
Extra savings can be obtained by using a backward telescopic 
search , as shown in Fig. 13, where the sequence of the search 
runs backward in time opposite to the direction of 
conventional telescopic search (see Fig. 6). Note that the 
accuracy of the backward motion vector is better than the 
usual forward telescopic search. The previous motion vector 
used as a prediction for a current search is more correlated 
to the current macroblock in the backward search than in the 



forward search, because the current macroblock to be matched 
is always fixed, whereas that of the forward search is always 
changing in the forward search. 

Experimental Results; 
FBR-TAMI algorithm: 

For testing of the FBR-TAMI algorithm 10 (see Fig. 4), 
simulations were performed using a tennis sequence in the GIF 
format (352 by 240 pixels) for different N-P schemes for N = 
0, 1, 2, and 3. The Huffman coding tables in the MPEG 
standard MPEG91 were used for the variable length coding. 
The difference of histogram (DOH) was used for the distance 
measure in the simulations for the TAMI algorithm because of 
its simplicity. 

Fig, 5 shows the SNR's for the unlimited PI (VBR TAMI) 
and limited PI (fixed TAMI) embodiments. It shows that using 
a Bl frame is better than using a P2 frame, even when many 
scene changes of Type 0 are detected due to very busy motion 
activity. Hence for the rest of the simulations except for 
VBR-TAMI results, the limited PI scheme is used. Figs. 
14(a) - 14(e) show SNR results for the temporally smooth 
region, and its corresponding bit rate per frame is also 
provided in Figs. 15 (a) -15(e). More specifically. Figs. 
14 (a) -14(e) show respective SNR vs. frames or images with 
little motion, for a tennis scene at an average bit rate of 
736.5 Kbit/sec for conventional 4-P, and 0-P, 1-P, 2-P, and 3- 
P schemes, respectively. Figs. 15 (a) -15(e) show the bit rates 



for images with little motion corresponding to Figs. 14(a) - 
14(e), respectively. From Figs. 14(a) and 14(e), from amongst 
the several respective N-P schemes, the l-P scheme performs 
the best in term of SNR and subjective quality, and its SNR is 
about IdB better than the conventional 4-P fixed scheme. 
Figs. 16(a) though 16(e) show the results for the high 
temporal activity region with an abrupt scene change, and the 
corresponding bit rate results are given in Fig. 17(a) through 
17(e), respectively. The SNR of the 1-P scheme is also 
slightly better (by O.SdB) than that of the conventional 4-P 
scheme even in the case with a scene change. The decoded 
image quality of the l-P scheme has been determined to compare 
favorably with that of the conventional method 4-P scheme. 
The performance improvement using the l-P scheme is more 
noticeable for low bit rate coding (300, Kbit/sec - Tennis) , 
and the SNR and the bit rate results for the target bit rate 
are given in Figs. 18(a) - 18(e) and Figs. 19(a)- 19(e), 
respectively. Averages of SNR's and bit rates for Fig. 14, 
Fig. 16, and Fig. 18 are given in tables shown in Figs. 20, 
21, and 22, respectively. They show that the present 1-P 
FBR-TAMI scheme outperforms the conventional fixed 4-P scheme 
in most of the cases. 



VBR-TAMI algorithm: 



Fig. 23(a) and Fig. 23(b) show SNR and its bit rate 
comparison, respectively, between VBR-TAMI and 0-P FBR-TAMI, 
where they have the same average bit rate of 663Kbit/ sec, in 
this example. Hence there is less temporal variation in the 
picture quality. As expected, the SNR for VBR-TAMI is more 
stable than FBR-TAMI. At the cost of variable bit rate 
output, the present VBR coding scheme can handle scenes with 
rapid motion without significant loss of quality. 

Optimal spacing algorithm: 



Figs. 24(a) - 24(e) show relative distances between a 
current frame and the first frame (frame 120) using the five 
different measurement methods, e.g. DOH, HOD, BH, BV, and MCE 
methods of measurement, respectively. In the plots, it can be 
seen that the DOH and MCE criteria are more sensitive to 
global motion rather than to local motion, while the other 
three criteria are sensitive both to global and to local 
motion as discussed above. More specifically. Figs. 25(a) - 
25(e) show curves for SNR vs. frame number for an OSA using 
DOH, HOD, BH, BV, and MCE methods of distance measurement, 
respectively. HOD performs the best because its sensitivity 
to local motion is more important when there is no global 
motion between frames 120 and 180, only some local motion. 



The table of Fig- 2 6 shows that the MCE criterion 
produces the best overall performance for various motion 
activities. In the table, frames 89-149 represents a scene 
with motion, frames 120-180 a scene with an abrupt scene 
change at frame 148, and frames 0-60 a scene with very high 
global motion (zooming out from frame 30 to 60) . The good 
performance of MCE may be because MCE is the nearly ideal 
distance measure and is expected to be more robust to' various 
motion activities. 

Figs. 28(a) - 28(e) shows SNR results using the adaptive 
optimal spacing algorithm with B2 frames for the five 
different distance measures DOH, HOD, BH, BV, and MCE, 
respectively. BH performs the best because it is also a good 
measure of local motion and there is no global motion between 
frames 120 and 180. However, the table in Fig. 27 shows that 
the BH criterion produces the best overall performance for 
various motion activities. Comparisons between the tables of 
Figs. 26 and 27, shows that the performances with different 
distance measures are similar with one another. Comparisons 
between the tables of Figs. 20 and 27, shows that the 
perfoirmances of FBR-TAMI and OSA are also similar, with slight 
differences depending on what kind of distance measure is 
used. 

As illustrated and described above, the present invention 
provides that positions of video reference frames are made 
adaptive to input video motion activities, and bit rate 
control is used to exploit masking effects in visual 
perception. Three major different algorithms, FBR-TAMI, VBR, 



and OSA, are presented, and shown to compare favorably over 
conventional motion interpolation coding with fixed GOP 
structxires. Although FBR-TAMI and OSA are similar in their 
performances, TAMI has lower algorithmic complexity. The 
trade-off in this approach is that a scheme with a lower 
nxunber of predicted frames has a better compression ratio at 
the cost of larger encoding delay. Embodiments of the present 
invention are expected to be useful in several different areas 
such as the variable bit rate coding, low bit rate coding, 
coding for storage on CD-ROM, and temporally adaptive 3-D 
sub-band coding, for example. The FBR-TAMI algorithm is 
suitable particularly for low bit rate coding such video 
conferencing or video phone where the rarity of rapid motion 
is very advantageous, and it is also suitable for storage 
application like CD-ROM where relatively large delay in 
encoding can be tolerated. 

In Fig. 29, a composite of three curves shows a 
comparison between the TAMI and OSA embodiments of the 
invention relative to image movement. The uppermost curve 120 
shows a plot of image movement versus frame number for a GOP 
of 15 frames, in this example, the image movement curve 122 
shows a region 124 of "busy temporal activity" between frames 
1 and 7, and a region 126 of "low temporal activity" 126 
between frames 8 and 15. As shown, in region 124 P frames 
occur more frequently or are closer to one another in this 
region because there is more data change, that is there is 
greater image movement from one frame to another. 
Contrariwise, in region 126 where image movement is 



substantially less, the P frames occur less frequently, or are 
further apart from one another, because there is less data 
change or image movement from one frame to another, in the 
curve section 128, TAMI processing for coding frames is shown 
as a plot of frame distance, that is the global picture 
movement between frames relative to frame number. The frame 
distance or movement at which a Type o threshold is detected 
is shown by the broken line 130. As shown, each time the 
frame distance or image movement between frames exceeds the 
Type 0 threshold 130, the immediately previous frame from the 
occurrence of the Type 0 threshold is designated as a P2 
frame. As previously explained, in this example, a GOP 
consists of 15 frames, designated by frame numbers "O" through 
"14", with the "isth" designated frame actually being the 
first frame of the next GOP. The first frame is always 
designated as an "I" frame. The next frames located between 
any two reference frames, such as P frames and I frames are 
designated "B" frames, as shown. Note that when using the 
TAMI processing as shown in curve section 128, the P frames 
are further apart in the region of low temporal activity 126, 
relative to the region of busy temporal activity 124. By 
using OSA processing, as shown in the curve section 132, 
certain of the P frames designations are changed to shift 
right or left for making the P frames as equidistant from one 
another as possible. Accordingly, as shown, TAMI designated 
frame lo as a P frame, whereas through OSA processing, in this 
example, the P frame is shifted to the left from frame 10 to 
frame 9. Similarly, in TAMI curve 128, frame 13 is designated 



as a P frame, whereas through OSA processing, the P frame is 
shifted from frame 13 to frame 12, as shown in curve section 
132. Also as a result of this shifting, frame 9, designated 
as a B frame in TAMI, is designated as a P frame in OSA, frame 
10 designated as a P frame in TAMI is designated as a B frame 
in OSA, and frame 13 designated as a P frame in TAMI is 
designated as a B frame in OSA. As a result, the P frames in 
region 126 are more equidistantly spaced from one another, 
making more efficient use of bit allocations via OSA, relative 
to. TAMI. 



Adaptive selection of the nnm h er (H) of reference fr^Tn^^^ r 

A bit rate control algorithm, mainly for TAMI, to allow 
variable N for another embodiment of the invention will now be 
described. Note that as indicated above, N is the number of 
PI frames. one simple approach to adapt N is to use a 
constant threshold for detection of a Type 0 scene change, and 
use one P frame for each detected. To adapt N subject to a 
fixed bit rate constraint, a variable bit allocation as 
described above is used. The target bit allocations for 
different picture types are updated according. 

To describe the algorithm, let the channel bit rate 
(bits/sec) be denoted by R, GOP size by M, expected GOP bit 
rate by g, and target bit allocation for picture Type t by D,. 
The bit allocations for II, 12, Pi, P2, and Bl frames: D„ = 
C„x, Dq = CqX, Dp, = Cp,x, D„ = c„x, Db, = C„,x, respectively, 
where x is a common factor and C,„ c,,, Cp,, c„, and C„, are 



constants for II, 12, PI, P2, and Bl frames, with = Cpj = 
Cgj. Unlike the constant bit allocation scheme described 
above, x now is allowed to vary from GOP to GOP, thereby 
providing the present variable bit allocation scheme. Note 
that B2 frames that were used for a limited variation of N by 
1 are not required. The common factor x is determined by the 
relationship, R = 2G where G = (C„ + NCp, + (M - N - 1)Cb,)x. 
The following formula for target bit allocation results: 



This bit allocation is updated by use of equation (30) at 
the beginning of each GOP. 

A fast heuristic approach for positioning fBS E-TAMI) : 

Another embodiment of the invention designated BSE-TAMI 
(Binary Search Equidistant TAMI will now be described. Assume 
N SSPs (scene segmentation points or Type 0 scene changes) are 
detected by the scene change detection algorithm 14 (see Fig. 
4) using a constant threshold. Assume that the distance 
measure is an integer and, as a basis for developing a 
heuristic, is a monotonically increasing function with respect 
to the time separation between two frames. HOD (histogram of 
difference) is used in such a simulation to measure motion by 
distance measurements, because it generally tends to be 
monotonic. 



The problem is to find nearly equidistant positions of 
SSPs or Type 0 scene changes. The present fast heuristic 
search is for positions that are close to the best positions. 
Fig. 3 0 is an example where two SSPs or Type 0 scene changes 
are detected by an SSP detector 14 using an initial threshold, 

which produces N SSPs* Denote the distance between the 
last SSP and the end frame of a GOP by a(r). Also denote ao = 
a (To)* The problem is to start with Tq > a(To) = ao and to find 
the smallest r in [ao, Tq] satisfying r > a(T). Since the 
distance measure is assumed to be monotonic, a(T) either 
increases or stays constant as t decreases from Tq- More 
specifically as r decreases from t^, eventually t < a(T) will 
be attained. In other words, point A crosses or reaches point 
B at some point as the threshold is decreased (see Fig. 30) . 

Since T is an integer and has a finite range of [ao, To], 
a binary search is used for the solution r.. In other words, 
a(T^), is computed using the middle point 



and comparing with a(r^). If t,^ > a(r^), a search is 
conducted on the lower half of [ao. To]; If < a(r,^), the 
upper half is searched. One continues by computing a(r) for 
the new middle points of the new search region until a 
stopping criterion is satisfied. 



The terms to be used in a BS E-TAMI algorithm are defined 
as follows: 

• b: the bottom end of the search region. 

• t: the top end of the search region. 

• m: the middle point of the search region. 

• SSPDET (N,T): SSP detector using a threshold t 
where the maximum allowable number of SSPs is N. 

• N(T) : the number of SSPs detected by SSPDET 
(N,T) . 

Note that N(T) < N. 



• pos(T): positions of SSPs detected by SSPDET 
(N,T). 



• dr: previous threshold m satisfying m > a(m). 

• dpos: previous positions corresponding to dr. 

The algorithm to find N positions is described generally 
as follows below: 

1. Pick an initial threshold To that produces N SSPs. 
Run SSPDET (N,To) and compute ao = a(To) . 

Set dT ^ to and dpos pos (Tq) . 

2. If To < ao, then go to step 6, otherwise 
{b - ao; t *- To} 



3. m - (b+t)/2. 

Run SSPDET (Nim) and compute a(m). 

4. If m > a(m) , then { 
t m - 1; 

If N (m) = N, then {dr - m; dpos - pos(m)} 
} otherwise 

{b - m - 1} 

5. Repeat step 4 until m = a(in) or b > t 

6. T. dr and stop. 

dpos is the desired positions of SSPs 

corresponding to t.. 
In step 6, dT becomes the desired solution because it is 
just before the position where r becomes larger than a(r), 
which means r is closest to a(r). A more detailed description 
Of the algorithm for BS E-TAMI is shown in Fig. 31, for steps 
133 through 145. 

If a brute force search for r. were used, the required 
computation is on the order of lo' assuming to « lo' and a^ « 
0. The required computation using a binary search is [(log.L) 
+ 1] when the data size is L, which becomes about 15 when L = 
10\ About a thousand-fold computational saving is obtained 
using the binary search, when monotonicity fails due to 
periodic motion for example, the heuristic is to use the 
initial SSP positions, pos(To). 

This approach of employing a binary search can easily be 
combined with the adaptive N scheme previously discussed. The 
advantage of this binary search approach is that it is fast 
and very simple compared to E-TAMI (equidistant TAMI) . The 



disadvantage is that it fails when the monotonicity assumption 
is not satisfied. However, the assumption is valid for most 
GOPs in ordinary video material. 

Hardware System/Software Implementation: 

The present inventors conceived a software implementation 
of the TAMI and OSA embodiments of the invention, as shown in 
Figs. 32 through 38, which will now be described. Note, that 
the hardware implementation for the TAMI embodiment of the 
invention is shown in Fig. 3, as previously described. 

In Fig. 32, a flowchart for a software implementation 
embodiment for both the TAMI and OSA embodiments, as shown, 
includes steps 150 through 161. Note that this software 
implementation is applicable for all TAMI embodiments, 
including general TAMI, VBR TAMI, and FBR TAMI. More 
specifically, after starting the software routine in step 150, 
scene change detector step 151 is entered for detecting 
accumulated slow changes or complete slow changes, whereby 
different versions for software routines may be required for 
the various embodiments of the invention. The next step 152 
or software routine "11 picture coding" provides for 
initialization of the group of pictures or GOP being 
processed. The first . frame is coded as an I frame in 
correspondence to the MPEG standard. However, in the present 
invention, the first frame is more specifically coded as an II 
picture providing full resolution coding, whereas as 
previously described, 12 coding may be used for other frames 



within the GOP being processed in association with the 
detection of Type 1 scene changes, whereby the 12 coding is 
coarser than the II coding. The next step 153 determines when 
the current picture index or actual picture number (CPICI) 
corresponds to the first frame of a GOP. in this step the 
variable is set to be the first picture number (FPN) . The 
next step 154 provides for an index "i" for a scene segment, 
which index is initially set to zero for the first scene 
segment, and successively increased by one as additional scene 
segments or frames are processed. 

The next step 155 processes the data through a encoding 

algorithm (MAIN) , which includes some of the MPEG standard 

with the addition of new steps conceived by the inventor, as 

will be described below. 

In the next step 156, the index for a scene segment "i" 

is incremented by one for advancing the processing to the next 

scene segment. 

The next step 157 is a decision step for determining 
whether the index "i" is less than the number of scene 
segments (SCNUM) or loops as determined by the scene change 
detector in step 151. Note that a high number of segments is 
assigned if there are a high number of scene, changes. In 
other words, the number of segments is directly proportional 
to the number of scene changes detected, thereby making the 
system an adaptive system. 

After the final scene segment has been processed through 
the loop including steps 155 through 157, decision step 157, 
after determining that the final scene segment has been 



processed, exits from the processing loop into step 158, in 
which the actual picture number corresponding to the first 
frame of the GOP just processed is incremented by 15. Note 
that in this example, as previously explained, the GOP size 
chosen is 15 in the preferred embodiments of the invention, 
but per the MPEG standard can be other than 15. 

The next step 159, SCDET, detects the scene changes for 
the next 15 frames, i.e. the frames of the next GOP to be 
processed . 

The nejct step 160 determines whether the current picture 
index or number CPICI, incremented by 15, is less than the 
last picture number LPN. If the answer is yes, the data for 
the current frame being processed is fed to step 154, for 
continuing processing in the loop represented by steps 154 
through 160. Once the last picture number LPN is processed, 
decision step 160 exits from the loop, ending the encoding 
process in step 161. 

In Fig. 33, the scene change detection processing, SCDET, 
for a FBRTAMI or fixed bit rate TAMI, is shown. This 
flowchart is similar to the flowchart of Fig. 4, but provides 
greater details of the processing. Note that the SCDET for 
VBRTAMI is shown and described above in association with Fig. 
10. Similarly, the SCDET embodiment for the OSA embodiment of 
the invention is shown and described above in association with 
Fig. 11. 

In Fig. 33, the SCDET for the FBRTAMI or fixed bit rate 
TAMI begins with the loading of the GOP frames into memory in 
step 170. IN step 171 counters in the system are initialized. 



In the example given for step 172, the scene index for a scene 
change is shown as being set to 2, the picture number of the 
first frame of a scene segment for a current frame is shown as 
set to 0, and the index for a scene segment is incremented by 
1 before proceeding to the next step. In step 173, scene 
segment data f^ is copied into a frame reference memory f^, 
and a current picture frame counter is set to 1. Next, in 
decision step 174, a determination is made as to whether the 
distance or movement between a current frame f^ and an 
immediately previous reference frame f^., is greater than a 
threshold T, for a Type 1 scene change. If the answer is 
aff iinnative, step 178 is entered. If the answer is negative, 
step 175 is entered, for determining whether the motion 
between the current frame and a previous reference frame 
exceeds a Type 0 threshold Tq. If affirmative, step 180 is 
entered, otherwise step 176 is entered. 

If step 178 is entered from step 174, the scene change 
Type is set to 1, the picture number c of the first frame of a 
scene segment is identified, and the last frame of a previous 
scene segment is identified by (c-1) ; and the index for the 
scene segment is incremented by 1, as shown. Next, in step 
179, the current frame data f, is copied to the reference frame 
data f „f . From step 179, step 176 is entered for determining 
whether the end of the present GOP being processed has been 
reached, that is whether the 15th frame has been processed. 
If the answer is no, step 177 is entered for incrementing the 
current frame by 1 to proceed to the next frame, then the 
processing loops back into step 174. However, if the answer 



is yes, meaning that the frcimes within the GOP have been 
processed, step 184 is entered for processing the next GOP. 
Note that in Fig. 33, in the legend shown, that "D(*, )" 
designates a distance measure for the cimount of motion between 
frcimes. Also, the notation "Cond. A:sct[scindex-l]=l & 
(PNSCL[scindex-l]-PNSCF[scindex-l])=0", means that a scene 
chsmge type of a previous segment is 1, and its scene driration 
is 0. 

Fig. 34 shows an "N-P TAMI SCDET" scene chamge detector 
for use in the TAMI encoder routine of Fig. 32. Note that a 
substantial number of the steps in the flowchart of Fig. 34 
for N-P TAMI SCDET are the same as the steps shown in a 
portion of the flowchart of Fig. 33, wherein the reference 
designations for these steps are identical. For example, the 
initialization steps 170 through 173 are the same for the 
flowcharts of Figs. 33 and 34. Steps 200 through 205 of the 
N-P TAMI flowchart of Fig. 34 are different than the SCDET 
FBRTAMI flowchart of Fig. 33. In the flowchart of Fig. 34, 
steps 174, 178, 201, and 202, detect a scene change of Type 1 
whereas steps 180 through 183, and 200, detect a scene change 
of Type 0. Also, further note that these steps indicated for 
providing the Type 1 and Type 0 scene changes, together with 
step 179, provide an Exclusive OR function. Also note that 
steps 203 through 205 provide for the insertion or generation 
of default positions for PI designated frames. For 
convenience, in this example, such default positions are 
designated as a scene change Type 2. 

The pictTire coding step 152 of the TAMI encoder flowchart 



shovm in Fig. 32 is illustrated in greater detail in the 
flowchart of Fig. 35. Steps 250 through 256 provide for II, 
12, PI, P2, Bl, and B2 coding, as illustrated. Step 250 
provides for a discrete cosine transform of the data. Step 
251 provides the main data compression, and is a quantizer 
having step sizes that can be adapted to provide different 
quantization levels. Step 252 provides variable length coding 
VLC, such as Huffman coding, for example. The buffer control 
provided in step 256 is included as part of the MPEG standard 
bit rate control for making the output bit rate constant, 
whereby if excess bits are being used the quantizer is 
controlled for coarser quantization in order to reduce the bit 
use. steps 253 and 254 provide inverse quantization, and 
inverse discrete cosine transform processing, respectively. 
Step 255 provides for saving or storing- in memory the decoded 
results from the inverse discrete cosine transform 254, which 
results are used for later motion compensation. Note that as 
shown in the legend, the present inventors elected to provide 
the quantizer in step 251 with 6 different default QS levels, 
whereby the greater the value of QS, the less resolution or 
coarser the quantization provided in step 251. As shown in 
this example, frames designated as Ii and Pi each have a 
designated quantization level of QS. Frames designated as 
bidirectional frames Bl or 32, each have quantization levels 
of 2QS. Frames designated as 12 have quantization levels of 
lOQS, and. frames designated as P2 have quantization levels of 
3QS. 

The MAIN encoding algorithm shown as step 155 in Fig. 32 



for the TAMI encoder is shown in detail in the flowchart of 
Figs. 36A and 36B for steps 260 through 278, and 280 through 
285, respectively. More specifically, in step 260 a count of 
past PI frames is kept, as designated by NPl, which initially 
has a "O" count as shown in the illustration, in step 261, a 
determination is made as to whether a scene change of Type l 
is attained, if the answer is yes, step 262 is entered for 
equating the current picture number FN to the picture number 
of the "ith" frame of a scene segment. Next, a decisional 
step 263, is entered for determining whether the picture 
number represents the last frame of the GOP. if so, step 264 
is entered for coding the frame as an II frame, if not, step 
275 is entered for coding the frame as a 12 frame. 

If in step 261, a scene change of Type l is not detected 
for the current frame being processed, step 273 is then 
entered for determining whether the motion between frames for 
the frame being processed is a Type o change, that is whether 
the Type 0 scene change has been attained. If not, step 265 
is entered. If so, step 274 is entered for decrementing the 
first picture number of the current scene segment by l, as 
shown. Note that the further processing steps for the II 
coding step of step 264, and 12 coding step 275, is shown in 
the flowchart of Fig. 35, infra. 

In step 265, it is determined whether the duration of the 
current scene segment being is zero or not. if so, step 156 
of the flowchart shown in Fig. 32 is entered. If not, step 
266 is entered to set the picture number to the last frame 
number of the segment being processed. Next, decisional step 



267 is performed for determining whether the frame position is 
the last frame of the GOP. If yes, step 276 is entered for 
performing telescopic motion estimation for all B frames 
between the first and last scene segments of the associated 
picture number, whereafter step 277 is entered for II coding 
via steps 250 through 256 in the flowchart of Fig. 35. 
However, if in step 267 it is determined that the frame 
position is not the last frame position, step 268 is entered 
for performing motion estimation for a P frame and preceding B 
frames via steps 290 through 297 of the flowchart of Fig. 37. 
After step 297 of the flowchart of Fig. 37, step 269 is 
entered for prediction processing in accordance with the MPEG 
standard. Next, step 270, a decision step, is entered for 
determining whether the next scene has a Type 1 scene change, 
and whether any Pi frames were previously detected. If the 
answer is no, P2 coding is conducted in step 278 via steps 250 
through 256 of Fig. 35. If the answer is yes. Pi coding is 
conducted via steps 250 through 256 of Fig. 35. Next, step 
272 is entered for incrementing by i the count for the number 
of past PI frames. 

.. Step 280 (Fig. 36B) is then entered for setting the 
current picture number to the picture number of the first 
frame in the last scene segment incremented by l. Next, 
interpolation step 281 is entered for conducting interpolation 
in accordance with the MPEG standard. Thereafter, step 282 is 
entered for determining whether the number of Pi frames is 
equal to 0. If the answer is yes, the B2 coding step 283 is 
entered, if the answer is no, the Bl coding step 285 is 



entered. Note that the substeps 250 through 256 for steps for 
283 and 285 is shovm in the flowchart of Fig. 35, as 
previously described. Next, decisional step 284 is performed 
for determining whether the current picture nvmber PN is less 
than the picture nvimber of the last frame of a scene segment 
"i". If the answer is yes, the processing loops back to step 
281, whereas if the answer is no, the process proceeds to step 
156 (see Fig. 32) for incrementing by 1 the index "i" for a 
scene segment. 

For the MAIN step 155 (see Fig. 32) or MAIN encoding 
algorithm, as described for the flowchart of Figs. 36A and 
36B, the motion estimation step 268 for determining the motion 
associated with P frames, is shown in detail in the flowchart 
of Fig. 37. As shown, the first step 290 is for sending the 
current frame number to the picture number of the "i" scene 
segment incremented by l. Next, in step 291, the forward 
motion vector search between the current frame and the picture 
number scene segment of the last reference frame is computed. 
Next, the current frame number is incremented by 1 in step 
292. Thereafter, step 293 is entered for determining whether 
the current frame is less than the picture nvunber of the last 
frame of a scene segment. If the answer is yes, step 291 is 
entered through a loop, whereas if the answer is no, step 294 
is entered for setting the current frame number to the picture 
nuBa5er of the last scene segment decremented by l. Next, step 
295 is entered for conducting a backward motion estimation, 
using in this example the standard algorithm from the MPEG 
standard. Next, step 296 is entered for decrementing by 1 the 



current frame number FN. Thereafter, step 297 is entered for 
determining whether the" current frame number is greater than 
the picture number for the first frame of the scene segment of 
the first frame. If the answer is yes, the processing loops 
back to step 295, whereas if the answer is no, the processing 
proceeds then to the prediction step 269 shown in Fig. 36A. 

In the flowchart of Fig. 36A for the MAIN encoding 
algorithm of the TAMI encoder of Fig. 32, the motion 
estimation step 276 for I frames is shown in detail in the 
flowchart of Fig. 38 for steps 300 through 307. Note that the 
flowchart of Fig. 37 for the motion estimation steps for P 
frames is almost identical to the flowchart of Fig. 38. m 
other words, steps 290 through 292 and 294 through 297, of the 
flowchart of Fig. 37 are identical to steps 300 through 302, 
and 304 through 307, respectively, of the flowchart of Fig. 
38. The only difference between the MEP and the MEI 
processing is between steps 293 of the former and 303 of the 
latter, step 293 includes the last frame in the associated 
determination step, whereas step 303 excludes the last frame, 
and therefore is only determinative of whether the current 
frame number is less than the last frame. 

A hardware system for permitting various embodiments of 
the invention as described above to be accomplished will now 
be described in greater detail than for the systems of Figs. 3 
and 9. With reference to Fig. 39, the system includes a scene 
Change detector 310 for receiving video data, and designating 
the various frames as I, p, or B, via detection of scene 
changes of Type 0 and/or Type l. The associated GOP is stored 



within the scene change detector 310. The Control Signal CS 
is obtained from a microprocessor 334, programmed to carry out 
the required steps, as previously described. One output 
signal on line Al is a picture mode signal (PMS) , which is a 
control signal identifying the designation of the frame being 
processed as either an I, P, or B frame. This signal is 
connected as an input signal to a motion compensator 318, 
which is described in greater below. Another output signal 
along line A2, is a frame output signal The signal is 
connected to a stunming junction 312, and to the motion 
compensation module 318 via Gl. An output along line GO from 
motion compensator 318 is connected to the summing junction 
312. The output signal or data of the summing junction 312 is 
connected to a discrete cosine transform module 314, which is 
connected in cascade with a quantization module 316, variable 
length coding module 324, multiplexer module 328, bit counter 
module 330, and a buffer 332. Feedback lines Ji and J2 are 
connected from buffer 332 and bit counter 330, respectively, 
to a bit rate controller 326, the output of which is connected 
along line JO to the quantizer module 316. Another output of 
the quantizer module 316 is connected to an inverse quantizer 
320 (Q-'), the output of the latter being connected as in input 
to an inverse discrete cosine transform module 322. The 
output of the inverse discrete cosine transform (IDCT) module 

322 is connected to another summing junction 323, which 
junction also receives an output signal along GO of the motion 
compensation module 318. The output from a summing junction 

323 is connected via G2 to an input of the motion compensation 



module 318. Note that the portions of the encoder enclosed 
within a dashed line area designated 3 08 are representative of 
a typical MPEG video encoder known in the art. The present 
inventors added the scene change detector module 310, bit rate 
controller 326, and motion compensation module 318 to the 
prior encoder, for modifying the same to accomplish the 
various embodiments of the present invention. Note that the 
output line MV from the motion compensation module 318 to the 
multiplexer (MPX) 328 provides motion vector bits to the 
latter. Also, the inverse quantizer 320 and inverse discrete 
cosine transform module 322 simulate a decoder, as is 
conventional in the art. The bit counter 330 serves to count 
bits generated in the system for determining the output 
behavior of the system, in order to control the quantizer 316 
in a manner ensuring that the number of bits being utilized do 
not exceed the capability of the system. The bit rate 
controller 326 adjusts the coarseness of quantizer 316 to 
accomplish this result. 

The scene change detector module 310 will now be 
described in substantially greater detail with reference to 
Figs. 40 through 45. Further reference is also made to the 
algorithms previously described above, particularly the 
algorithm of Fig. 32. As shown in Fig. 40, the scene change 
detector 310 includes a frame memory 33 6 connected to a 
distance computation unit 338, and a scene change detector 
controller 346, as shown. The frame memory 336 is also 
responsive to a scene change detector control signal (SCDCS) . 
The frame memory 336 is a standard frame memory for 16 frames. 



in this example, assuming a GOP of 15 frames. The distance 
computation unit 338 computes the distances or motion between 
current and subsequent or following frames, and is also 
responsive to the SCDCS signal. It is also responsive to a 
reference frame nmnber signal from a Type 0 scene change 
detector module 342, or from a Type 1 scene change detector 
module 340, as shown. Note that the feedback signal is a 
feedback signal that the distance computation unit responds to 
for resetting the reference frame positions used in distance 
or motion computation between frames, as previously discussed. 
The Type 1 scene change detector module 340 also provides 
output signals to both the Type 0 scene change detector 342, 
and to the GOP structure generation unit 344, as shown. The 
latter two scene change detector modules 34 0 and 342 are 
discussed in greater detail below. The GOP structure 
generation unit 344 is controlled via the SCDCS control 
signal, and provides an output along E3 to the scene change 
detector controller 346, and receives a signal along E2 from 
the latter. Controller 346 also receives a signal via FO from 
the frame memory module 336, and provides a frame output 
signal on line or bus A2, and the picture mode signal PMS on 
signal line Al. The GOP structure generation unit 344 detects 
positions used to generate the complete GOP structure or map 
in accordance with the MPEG standard. Also, note that the SCD 
controller module 346 can be provided by a small 
microprocessor or by hardwired logic for the timing, data 
flow, synchronization, and so forth. 

The distance computation unit 338 is shown in greater 



detail in Fig. 41. As shown, a histogram module 348 is 
included for providing a histogram of the previous reference 
frame from data received along Bl, and another histogram 
module 350 is included for producing a histogram of the 
current frame, the data for which is received along line BO. 
The histogram module 348 is connected to a memory 352 for 
storing the previous reference frame data until the memory 3 52 
is reset by the scene change detector module 340 or scene 
change detector module 342. An output of memory 352 is 
connected to an absolute difference unit 354, which also 
receives the histogram data from histogram module 350 for the 
current frame. The absolute difference therebetween is 
computed by unit 354 and outputted to an adder module 356, the 
output of which is fed along line Bl as the histogram 
difference to the scene change detector modules 342 and 344. 
Note that although the example of Fig. 41 for the distance 
computation unit 338 shows the use of Histogram modules 348 
and 350, block variance processing, histogram of difference 
processing, and so forth, could have alternatively been used 
in place of the histogram technique. These other techniques 
have been previously described above. 

With reference to Fig. 42, the Type 1 scene change 
detector module 340 will now be described. A comparator 358 
is included for receiving the distance or motion signal along 
line B2 from the distance computation unit 338, and comparing 
the signal with the Type 1 threshold signal Tl, whereby if B2 
IS greater than Tl, the output of the comparator is indicative 
of a digital "i", whereas if the signal B2 is less than Tl, 



the output of the comparator 358 is representative of a 
digital "0". The output from comparator 358 is fed to both a 
detection signal unit 360 which acts as a buffer, and to a 
position output unit 362 which acts as a frame nxamber 
generator for providing F^, which is set to (current frame 
number) , along the line designated at a lower end as C2 and at 
an upper end as Bl, as shown. Note that the buffer 360 is a 
noninverting buffer. Accordingly, if a digital "1" is 
provided as an input signal thereto, the output signal from 
the detection signal generation unit 360 will also be a 
digital "1". 

The Type 0 scene change detector module 34 2 will now be 
described in greater with reference to Fig, 43. As shown, the 
comparator 3 64 is included for comparing a distance or motion 
measurement signal Dl with the TO threshold signal, for 
producing a digital "1" output if Dl is greater than TO, and a 
digital "0" if the distance Dl is less than TO threshold. 
Comparator 364 also receives a Type 1 detection signal along 
Dl from the Type 1 scene change detector module 340, for 
inhibiting comparator 364, for in turn inhibiting the Type 0 
scene change detector module 342 if a Type 1 scene change is 
detected by module 34 0. The output of comparator 3 64 is 
connected as an input to a position output unit 3 66 which 
provides along Bl the F„f signal which is equal to F^,.„ for the 
frame number of the previous frame. Also, the position output 
unit 366 provides a signal along D3 indicative of whether a 
Type 0 scene change has been detected between frames. 

The GOP structure generation unit 344 of Fig. 40 will now 



be described in greater detail with reference to Fig. 44. As 
shown, unit 344 includes two memories 368 and 370 for storing 
Type 0 scene change positions, and Type 1 scene change 
positions, respectively. These memories 368, 370 are 
individually connected to inputs of a reference frame 
positioning unit 372, which determines I and P frame positions 
based upon the detected scene changes. One output from the 
reference frame positioning unit is connected to a B frame 
positioning unit 374, for designating remaining frame 
positions not designated as I or P frames, as B frames. The 
reference frame positioning unit 372 is also connected to a 
PMODE memory 378. The PMODE memory 378 also receives the B 
frame positions from unit 374, and serves to store the I, P 
and B frame positions for the GOP being processed. In this 
example, the PMODE memory 378 contains 16 registers designated 
as the 0 through 15 registers relative to frames "0 to 15", 
respectively. The PMODE memory 378 receives along line E2 a 
PMODE memory control signal, and outputs along line E3 a PMODE 
read signal. 

The scene change detector controller module 34 6 of Fig. 
40 will now be described in greater detail with reference to 
Fig. 45. Controller 346 includes a control circuit 380 that 
can include either a small microprocessor or be provided by 
handwired logic. The control circuit 380 is receptive of a 
PMODE read signal along line E3, and outputs both a PMODE 
memory control signal along line E2, relative to the GOP 
generation unit 344 (see Fig. 40). The control circuit also 
outputs the scene change detector control signal SCDCS. As 



further shown, control circuit 380 is connected to a picture 
mode signal (PMS) generation unit 384, and to a frame 
sequencing unit 382. The frame sequencing unit 382 acts as a 
buffer, and functions to receive frame data along line FO from 
frame memory 33 6, the data being representative of the actual 
image data of the frame being processed, whereby the frame 
sequencing unit 382 provides frame data as an output along 
line A2 representative of the frame being processed. Also, 
the picture mode signal generation unit 384 provides along 
line Al the picture mode signal (PMS) that represents a switch 
control signal described in detail below, for permitting 
detailed identification of the frame being processed. 

In Fig. 46 the motion compensation module 318 shown in 
Fig. 39 is shown in detail. The components enclosed within a 
dashed line rectangular area 385 represent a motion 
compensation circuit known in the prior art. The present 
inventors added two modules to the known motion compensation 
network, which modules include a telescopic motion estimator 
(ME) module 396, and a switch control block module 408, as 
shown. The prior motion estimation network 385 includes one 
IMODE or intraframe mode switch 386, and four BMODE switches 
388, 394, 400, and 402, respectively. Three other data 
transfer switches 398, 410 and 412 are also included, as 
shown. Each one of these switches are controlled by an MSC or 
mode switch control signal, as shown, which signal is 
generated by a switch control block 408 in response to the 
picture mode signal PMS received along line Al. The truth 
table for switch control block 408 is shown in Fig. 47 as 



Table 414. Also, as shown, each of the switches are operable 
in response to the MSG signal for switching their associated 
switch arm between a "0" and "i" contacts. A B frame 
compensator 390 has individual connections to switches 388, 
394, 400, and 402. A P frame compensator 392 has individual 
connections to switches 388, 394, and 398, respectively. 
Switch 394 has its switch arm connected to a motion estimator 
module 396. Switch 398 has its arm connected to the P frame 
compensator 392, and its "i" and "0" contacts connected to the 
»0V contact of switch 400, and "O" contact of switch 402, 
respectively. The switch arm of switch 400 is connected in 
common to a frame memory FO 404, and via an H2 line to a 
motion estimator module 396. A switch arm of switch 402 is 
connected in common to a frame memory Fl 406, and via a signal 
line H3 to motion estimator module 396. Frame memories 404 
and 406 are also connected to the "1" contacts of switches 410 
and 412, respectively. The switch arms of 410 and 412 are 
connected in common along line G2 to a summing junction 323 
(see Fig. 39) . 

Operation of the motion compensation module 318 will now 
be .described with reference to Figs. 46 and 47. The B frame 
compensator module 390 and P frame compensator module 392 are 
operative for providing the necessary interpolation functions 
for processing B and P frames, respectively. When the PMS 
signal is "0, l", the IMODE switch 386 connects a 0 signal to 
summing junction 312. Switches 410 and 412 operate in an 
alternate manner for connecting an input of either frame 
memory FO 404 or frame memory Fl 406 to the summing junction 



323. These switches are switched alternatively for 
alternately connecting memories 404 and 406 to the summing 
junction 323 so long as the value of the picture mode signal 
PMS is either 0, 1, 2, or 3. However, if the value of the PMS 
signal is either 4 or 5, switches 410 and 412 remain in their 
last switch position so long as the value of the PMS signal 
remains at either 4 or 5. If the value of the PMS signal is 
either 0 or l, switch 386 is operated for connecting 0 volt to 
summing junction 312 along signal line GO; B mode switch 388 
is operated for disconnecting the output of the P frame 
compensator 392 from the circuit or network; B mode switch 394 
is operated for connecting the output of the motion estimator 
module 396 as an input to the P frame compensator 392; switch 
398 is operated for connecting the input of the P frame 
compensator 392 either through switch 400 or switch 402 to 
frame memories 404 and 406, respectively, depending upon which 
one of these frame memories is the most current that has been 
loaded with data; and switches 400 and 402 are operated for 
connecting the outputs of frame memories 404 and 406 to the 
inputs of the B frame compensator 390. When the value of the 
PMS signal changes to a "2" or ••3", switch 386 and 388 are 
operated for connecting the output of the P frame compensator 
392 to the summing junction 312, and switches 400 and 402 are 
operated for individually connecting the outputs of memories 
400 and 406 to switch 398, whereby the latter operates to 
connect one of the frame memory outputs from memories 404 and 
406 to the input of the P frame compensator 392 dependent upon 
the position of the switch arm of switch 398 at the time. 



The motion estimator module 396 will now be described in 
greater detail. With reference to Fig. 48, motion estimator 
module 396 includes a motion vector memory 420 connected 
between a telescopic motion estimator controller 424 and 
estimator module 426. A frame memory 422 is connected to 
estimator 426, and telescopic motion estimator controller 424. 
Another frame memory 428 is connected to estimator module 426, 
and the controller 424. A third frame memory 430 is also 
connected to controller 424 and estimator module 426. The 
motion vector memory 420 for storing motion vectors for future 
and prior reference frames, and for B frames located between 
such reference frames. This memory is controlled and accessed 
via the telescopic ME controller 424. Frame memory 422 is 
used to store current frame data temporarily, until such data 
is required for use. The estimator module 426 performs the 
actual motion vector search, and uses any conventional motion 
estimator method, including a full search method, or any of 
the other prior methods as previously described herein. The 
telescopic ME controller 424 controls the timing for the other 
modules of the associated motion estimator module 396, and 
reads the motion vectors from the motion vector memory 420, 
and outputs the estimated motion vectors to the B frame 
compensator 390, or P frame compensator 392 via the Hi signal 
line. Frame memories 428 and 430 are used for storing 
reference frames, where at any given time one of these 
memories will store the immediate reference frame, and the 
other of these memories will store the future reference frame. 
Frame data is brought into the frame memory 428 via the H2 



signal line, and frame data is brought into the frame memory 
430 via the H3 signal line. 

The telescopic motion estimator controller 424 of the 
telescopic motion estimator module 396 will now be described 
in greater detail with reference to Fig. 49. As shown, 
controller 424 includes a memory 432 connected to a control 
circuit 436. The controller circuit 436 is connected to two 
read only memories (ROMs) 434 and 438, respectively. ROM 434 
is associated with a forward motion estimation (FME) sequence, 
whereas ROM 438 is associated with a backward motion 
estimation (BME) sequence. Control circuit 436 is operative 
for providing the estimator control signal along signal line 
13, the motion vector memory control signal along signal line 
II, and the frame memory control signal along signal 12, 
respectively. 

The bit rate controller 326 will now be described in 
greater detail with reference to Fig. 50. As shown, 
controller 32 6 includes a quantization parameter or QP 
adjustment module 440, and a target bit adjustment module 442. 
A truth table 444 is shown, which indicates how the scaling 
factor X for each picture type is changed in correspondence to 
the value of the picture mode signal PMS received by the 
quantization parameter adjustment module 440. This module is 
programmed for computing the equation Qp=31X(F/J) . By using 31 
steps in this quantization parameter equation, five bits may 
be used to designate the same. The ratio of buffer fullness F 
to buffer size J lies between zero and one, depending upon how 
many bits are associated with buffer 332 (see Fig. 39), i.e. 



from 0 to J, typically J represents the average bits needed to 
code about five pictures or 250Kbytes, when the bit rate is 
approximately 1.5 Mbyte/sec. Note with further reference to 
truth table 444, that when the PMS has a value of "0»', the 
scaling factor of l is relative to a li frame. When the PMS 
signal is "i", the scaling factor X is ten, in association 
with a 12 frame, which is a coarsely quantized frame in this 
example, when the PMS signal has a value ••2", the scaling 
factor X is "1", in association with a Pi frame. When the PMS 
signal is »3« , this scaling factor is 3, in association with a 
P2 frame, which is a somewhat coarsely quantized frame. When 
the PMS signal is "4", the scaling factor X is 2, in 
association with a Bl frame. When the PMS signal is »5\ the 
scaling factor X is 2, in association with a Bl frame. 
Lastly, when the PMS signal is "5", the scaling factor X is 2, 
for a B2 frame. Further note that the target bit adjustment 
module 442 operates to compute the equation D.=X.T,,/E,,, whereby 
the legend in Fig. 50 defines each one of the components of 
the equation for D., i.e. the target bit allocation for picture 
Type t. 

The embodiment of the invention for BS E-TAMI (Binary 
search Equidistant TAMI) , as presented above in association 
with the algorithm shown in Fig. 31, can be carried out using 
the general hardware configuration of Fig. 39. However, the 
scene change detector or SCD 310 is configured differently 
than for other embodiments of the invention. A generalized 
block diagram of the required SCD 310 configuration is shown 
m Fig. 51. The frame memo-y 336, GS or GOP structure 



generation unit 344, and the SCD controller 346, are as 
previously described for the embodiment of Fig. 40. The 
difference lies in the use of the binary search unit 450 
between the frame memory 33 6 and the GOP structure generation 
unit 344, as shown. 

The configuration of the binary search unit 450 is shown 
in greater detail in Fig. 52. The threshold unit 452 operates 
to compute the value for m, the middle point of the search 
region of the frame being searched. Note that the frame data 
is carried over bus BO from the frame memory 336. Control 
unit 454 operates to provide the appropriate timing and binary 
search control signal BSCS. Control unit 454, in response to 
the outputs of the scene change detectors 340 and 342, 
provides a signal indicative of a Type 0 scene change along 
output line D3, and the frame position for the Type 0 scene 
change along line C2 . The search region computation unit 456 
determines the next search region for conducting a binary 
search from one step to another of the associated algorithm, 
whereby the searching is conducted in an iterative manner, as 
shown in Fig. 31. 



Subband Video C oding with TAMT ; 



It is known in the art to subsample a discrete-time 
signal as a step in separating its lowpass and highpass 
frequency components. This technique has been extended to 
processing of an image or a frame of video by applying 
subsampling along each of the two spatial directions, when 



following appropriate highpass and lowpass filters, 
subsampling by a factor of two in each spatial direction 
decomposes an image or video frame into four subimages, each 
of which has one quarter as many pixels as the original image. 
These subimages (subband images) may be labelled as (L,, l^) , 
(Iv, H^) , (Hv, Lh) , and (H^, H^) where the uppercase letters 
denote the type of filter (H = highpass, L = lowpass) and the 
subscripts denote the spatial processing direction (v = 
vertical, h = horizontal). The subband images may be 
recombined by interpolation and filtering to reconstruct the 
original image from which they are derived. Any or all of the 
subband images may be further processed by the same steps to 
produce subband images of smaller sizes. In transmission or 
storage system applications, frequency selective image coding 
may be achieved by using different coding methods on different 
subband images . 

In yet another embodiment of the invention, a new subband 
video coding algorithm with temporally adaptive motion 
interpolation was conceived. In this embodiment, the 
reference frames for motion estimation are adaptively selected 
using temporal segmentation in the lowest spatial subband of a 
video signal. Variable target bit allocation for each picture 
type in a group of pictures is used to allow use of a variable 
number of reference frames with the constraint of constant 
output bit rate. Blockwise DPCM, PCM, and run-length coding 
combined with truncated Huffman coding are used to encode the 
quantized data in the subbands. As shown below, the 
thresholds for the Type l and Type 0 scene changes are 



adjusted in direct proportion to the number of pixels 
subsampled in a frame. Simulation results of the adaptive 
scheme compare favorably with those of a non-adaptive scheme. 

Subband coding is known as having an inherent 
hierarchical resolution structure, which is suitable for 
prioritized packet video in ATM networks. Another known 
approach for video coding is motion compensation which has 
been recently standardized into the MPEG standard. Temporal 
redundancy reduction methods using subbands may be classified 
into two approaches. One is 3D spatio-temporal subband 
coding, and the other is motion compensated 2D subband coding. 
The present subband embodiment applies the latter approach in 
implementing the present fixed output bit rate subband video 
encoder shown in Fig. 53, and it provides improved performance 
in removing temporal redundancy due to motion. 

In the known motion compensated 2D subband coding system 
of Y.Q. Zhang and S. Zafar, described in their paper "Motion 
Compensated Wavelet Transform Coding for Color Video 
Compression", IEEE Trans. Circuits Syst. Video Technol., Vol. 
2, No. 3, pp. 285-296, Sept. 1992, for purposes of determining 
motion vectors, two stages of subbband decomposition are used, 
whereby the (L,, i^) is decomposed by a second stage of low 
pass filtering and subsampling. This produces a low frequency 
subband image with one-sixteenth the number of pixels as the 
original image, comprised of the lowest one-fourth of the 
horizontal and vertical spatial frequency components of the 
original image, the so-called "lowest subband" image. 

In the present subband embodiment, each picture of the 



input video is decomposed into subbands by using biorthogonal 
filters. The motion compensation scheme uses temporally 
adaptive motion interpolation (TAMI) , as previously described. 
The number of reference frames and the intervals between them 
are adjusted according to the temporal variation of the input 
video. 

More specifically, the algorithm for the present subband 
embodiment is a slightly modified version of TAMI as described 
above. It is modified to allow a variable number of P frames 
in each GOP. The new subband TAMI algorithm takes the 
following steps for each GOP (group of pictures) : 

(1) It detects the positions of scene change of Type l; 

(2) It detects the positions of scene change of Type 0; 
and 

(3) It determines the positions of all I, P, B frames. 
For this embodiment, I, p, and B frames with full bit 

allocation are denoted as II, Pi, and B frames, an I frame 
with reduced bit allocation as 12, a P frame with reduced bit 
allocation as P2, as with other embodiments of the invention. 

Two types of scene detectors are required in the 
algorithm, as previously described for other embodiments of 
the invention. The first detector declares a scene change of 
Type 1 for the current frame when a distance measure between 
the current frame f„ and the immediate past from f„., is above a 
threshold T,. This type of scene change corresponds to an 
actual scene content change; it is coded as an 12 frame (very 
coarsely quantized intra frame) , and the immediate past frame 
f„., is coded as a P2 frame (very coarsely quantized predicted 



frame) . The 12 frame coding exploits the forward temporal 
masking effect, and the P2 frame coding takes advantages of 
the backward temporal masking effect. The second detector 
declares a scene change of Type 0 for the current frame when 
the distance measure between the current frame and the last 
reference frame is above a threshold Tq. In this case the 
immediate past frame f^, becomes a Pi frame. 

As indicated above, reference frame assignment strategy 
using Type 0 scene change detection is that the end frame of 
every temporal segment determined by a Type 0 scene change is 
a PI frame, and that the frames in between should be B frames. 
Examples of GOP structures generated by the TAMI algorithm are 
as previously shown in Fig. 8A and Fig. 8B. 

In Fig. 53 a block diagram for the subband coding system 
460 using TAMI is shown. The TAMI algorithm using multi- 
resolution motion estimation is applied via SGD 310 and motion 
estimator 4 62 on the lowest of the spatial subbands after 
subband decomposition via the subband analysis module 464. 
The motion compensated data are then quantized via quantizer 
316 and encoded by variable length coding in VLC module 324 
using a Huffman code scheme, for example. The buffer 332 is 
necessary to convert the output encoded data from variable 
rate to constant channel bit rate. 

Two stages of subband filtering are used to generate 
seven spatial bands "1 through 7" as shown in Fig. 54. In 
this example, the filters are separable 2D filters that use 
low-pass and high-pass biorthogonal filters as taught by 
D. LeGall and A. Tabatabai in their paper entitled "Subband 



Coding of Digital Images using Symmetric Kernel Filters and 
Arithmetic Coding Techniques" (Proc, ICASSP 88, pages 761-763, 
April 1988), for subband analysis: 

^i(^) = ^(-^ ^ 2z-^ + 6z-2 + 2z'^ - (32) 

8 

H^iz) = i(l - 2z-^ ^ z-2) (33) 



The constant factor, Vt, for the low-pass filter of equation 
(32) was chosen to provide a DC gain of 1. The corresponding 
synthesis low-pass and high-pass filters are G^(z) = E^i'Z) and 
Gb(z) = -Hi(-z) . These pairs of filters each have a perfect 
reconstruction property with three samples delay. In other 
words, it is easy to show 

X{z) = z-^X{z) (34) 

where X(2) is the input and X(z) is the reconstructed signal • 

The temporal segmentation (scene change detection) 
algorithm is applied on the lowest of the subbands, so that 
the amount of computation is reduced by factor of sixteen due 
to the reduced picture size of the lowest band. 

The multi-resolution approach for motion estimation 
provided by MRME 462, will now be described. The resolution 
level is set to s, which corresponds to the subband filtering 
stage. Let the maximum filtering stage be denoted by S, which 
is 2 in Fig. 54. In Fig. 54, s = 2 for bands (1, 2, 3, 4) and 



s = 1 for the others. The initial motion vectors are 
estimated only in band i, and they are scaled to generate 
motion vectors for other subbands, as follows: 

di {x,y) = dg {x,y)2^-^ + A^. U.y) (35) 



where d;(x,y) is the motion vector at block position (x,y) in 
resolution level i, ds(x,y) is the initial motion vector, and 
Ai(x,y) is the correction motion vector found by reduced search 
area. The initial motion vector, dj, is estimated by a full 
search with search range 4X4, where the block size is also 
4X4. In a computerized simulation, the inventors set Aj(x,y) 
= (0,0) because the overhead bits for the correction usually 
exceeded the saving of data bits. 

To allow a variable number of Pi reference frames in a 
GOP, a variable target bit allocation scheme is updated at the 
beginning of each GOP, as described above for the adaptive 
selection of reference frames. Hence, the formula for target 
bit allocation is the same as equation "(30)" given above. 

Within a GOP, the target bit allocation for each picture 
type is also allowed to vary to be adaptive to the changing 
scene complexity of the actual video sequence. The number of 
bits generated for the previous picture having the same 
picture type is used as the target bit allocation. When the 
number of bits produced for one frame deviates from the target 



xs 



number of bits, the bit allocation for the next picture 
adjusted to maintain an * acceptable range of bit rate according 
to the equation: 



T 

' ^ — (36) 

^GOP 



where t is a picture type, with t 6 {II, 12 , Pi, P2 , B} , D. is 
target bit allocation for picture type t, X, is the number of 
generated bits for the previous frame of the type t, Egop is 
the expected GOP bit rate computed by the most recent data of 
bits generated for each frame type, and Tgop is the target GOP 
bit rate. T^o, is computed by M(R/30), where M is the GOP size 
and R is the target bit rate (bits/sec) . can be computed 

by the equation: 



GOP = IT (37) 



Where K^, is the set of all picture types used in the current 
GOP, n. is the number of the frames of picture type t in the 
GOP, and X. is either the generated bits for the previous frame 
of the type t or the initial bit allocation for picture type t 
when the picture is at the beginning of a GOP. 

There are two other concerns for bit rate control; one is 
to adjust actual coding bits to target bit allocation, and the 
other is the buffer 332 content control to prevent a buffer 
overflow or underflow problem. Both of these control problems 



are handled by the following algorithm. At the end of each 
slice, the buffer 332 content is updated by F = F + S^g, - S,, 
where F is the buffer 332 content and S.„ is the number of bits 
generated for the slice, and S, is the number of target bits 
for the slice. To maintain stable buffer behavior and 
regulate the generated bit stream to be close to the target 
bit allocation per frame, the quantization parameter, Qp, is 
adjusted according to the buffer fullness by using the 
relation: 

Op = 31 X ^ (38) 



where J is the buffer size, taken as the amount of raw data in 
about three to five pictures, and the number 31 means there 
are 31 different nonzero step sizes which are coded by 5 bits. 

Blockwise DPCM and a uniform quantizer are used only for 
the subband 1 of an I frame. In the DPCM, horizontal 
prediction from the left is used except for the first column 
where vertical prediction from above is used as in Fig. 55. 
All subbands except the lowest subband of an I frame, 
regardless of the picture type, are coded by PCM (pulse code 
modulation) with a deadzone quantizer because there is little 
spatial correlation in high-pass filtered subbands as well as 
motion compensated residual images. Horizontal and vertical 
scan modes are chosen according to the low-pass filtered 
direction. Hence, the horizontal scan mode in Fig. 56 is used 
for bands 2 and 5 (see Fig. 54). This mode is also used for 
bands 4 and 7 (see Fig. 54) by default because the bands are 



high-pass filtered in both directions. The vertical mode of 
Fig. 57 is used for bands 3 and 6 (see Fig. 54). 

These scan mode selections contribute to statistical 
redundancy reduction by run-length encoding combined with 
Huffman coding. The Huffman coding table is generated from 
run-length histogram data obtained from several training image 
sequences including Tennis, Football, and f lowergarden. 
Truncated Huffman coding is actually used to limit codeword 
length. The codeword entries having length larger than 17 
bits, are replaced by fixed length codewords of either 20 or 
28 bits which are defined in the MPEG standard. 

Simulations were carried out using the Tennis and 
Football sequences to compare the TAMI algorithm to a fixed 
scheme. Block type decisions for P and B frames as in MPEG, 4 
X 4 block size for s = 1, 8 X 8 block size f or s = 2 , and 
telescopic searching having half-pixel accuracy for motion 
estimation are used. As for the two temporal segmentation 
algorithms, difference of histograms of gray levels were 
selected for the distance measure. The thresholds used are 
0.25Npi, for Type 1 detection, O.IN^ for Type 0 detection, where 
Npi^^is the number of pixels in a single frame. All three color 
components (Y, U, and V) are encoded, and the bit rate results 
are computed by summing the compressed data bits for the three 
color components, the bits for quantization parameter, the 
motion vector bits, and the header bits, but the SNR curves 
are for Y components only. 



Fig. 58 shows a table of performance comparisons of 
average SNR and bits. It shows that TAMI is better than a 
nonadaptive scheme by 0.9 dB for Tennis and 0.7 dB for 
Football. Although the SNR difference between the two schemes 
has been shown to be slight, TAMI has a more stable picture 
quality than the fixed scheme. TAMI automatically inserts 
more P frames by detecting the global motion, such as zooming, 
for example. 

In a real time display of the reconstructed sequences of 
Tennis and Football, the quality differences between TAMI and 
a nonadaptive scheme were much more noticeable. The quality 
of TAMI was shown to be clearer and to have less blinking than 
that of the fixed scheme. 

It was shown by experiments that adaptive selection of 
reference frames for subband motion compensation compares 
favorably with the nonadaptive schemes in terms of both 
subjective quality and an objective measure, SNR. The trade- 
off is that it requires a certain amount of encoding delay 
because it needs to look ahead at GOP frames prior to 
encoding. The present embodiment provides a good scene 
adaptive scheme for removing temporal redundancy using motion 
compensation in subband domain. 

In Fig. 59, a block schematic diagram supplementary to 
and an expansion of Fig. 53 as shown, for giving greater 
details of a system for the subband video encoding embodiments 
of the invention. The system as shown is very similar to the 
encoding system shown in Fig. 39, and like components relative 
to the latter are shown in Fig. 59 with same reference 



designation, which components provide the same functions as 
described for Fig. 39. The differences are that the motion 
compensation module 318 on Fig. 39 is replaced by the multi- 
resolution motion estimation module 4 62. Such a multi- 
resolution motion estimation module 462 is known in the art, 
and is described in the paper of y.Q. Zhang and S. Zafar, 
Ibid. Also, the discrete cosine transform module 314, and 
inverse discrete cosine transform module 322 of Fig. 39 are 
not included in the system of Fig. 59. Another difference is 
that a scan mode switch 321 is included between the quantizer 
316 and the variable length coding module 324 in Fig. 59^. The 
purpose of the scan mode switch 321 is to switch between the 
various scan modes, for processing each subband. Lastly, 
another difference is that the subband analysis module 464 is 
included before the scene change detector 310 in the system of 
Fig. 59, and is not included in the system of Fig. 39. 

The subband analysis module 464 of Figs. 53 and 59 is 
shown in greater detail in Fig. 60. As shown, the subband 
analysis module 464 includes a timing controller module 526, 
.^responsive to a control signal CS for outputting a timing 
jontrol signal TC that is used to control all of the various 
modules of the remainder of the system shown. Video data 
received along line KO is passed through a horizontal highpass 
filter 500, and a horizontal lowpass filter 502. The output 
Of horizontal highpass filter 500 is passed through a 
subsampling module or decimator or downsampler 505, for 
removing every other data sample, which are redundant after 
the filtering step, as is known in the art. The subsampled 



filtered video data is then passed through a vertical highpass 
filter 504 and vertical lowpass filter 506, respectively, the 
respective outputs of which are passed through subsampling 
modules 505, for providing the filtered subband data along 
output lines labelled "Band 7" and "Band 6", 
respect i ve ly . 

The filtered video data from the horizontal lowpass 
filter 502 is passed through a subsampling module 505, and 
provided as input data to both a vertical highpass filter 508 
and vertical lowpass filter 510. The filtered video output 
data from the vertical highpass filter 508 is passed through a 
subsampling module 505, and outputted as video subband data 
along an output line labeled "Band 5". 

The filtered output data from the vertical lowpass filter 
510 is passed through a. subsampling module 505, and delivered 
therefrom as input video data to both a horizontal highpass 
filter 514, and a horizontal lowpass filter 516. The filtered 
video output data from the horizontal highpass filter 514 is 
passed through a subsampling module 505, and provided as 
filtered video input data to both a vertical highpass filter 
518 and vertical lowpass filter 520, the respective outputs 
which are passed through respective subsampling modules 505 
and passed along subband lines shown as "Band 4" and "Band 3", 
respectively. 

The filtered video. output data from horizontal lowpass 
filter 516 is passed through a subsampling module 505, and 
therefrom provided as filter input video data to both a 
vertical highpass filter 522 and vertical lowpass filter 524, 



respectively; the respective outputs of which are passed 
through subsampling modules 505, respectively, and therefrom 
outputted onto subband lines "Band 2" and "Band l", 
respectively. All of the aforesaid horizontal and vertical 
highpass and lowpass filters enclosed within the dashed line 
area designated 499 represent a known subband filtering 
system, as previously indicated, for providing double 
filtering. The present inventors added a multiplexer 512 for 
receiving the subband data from subband lines "Band 1" through 
"Band 7", respectively. The multiplexed output data is then 
provided along output line AO from multiplexer 512 for 
connection as input data to the scene change detector 310, as 
shown in Fig. 59. 

Although various embodiments of the invention have been 
shown and described herein, they are not meant to be limiting. 
Those Of Skill in the art may recognize certain modifications 
to these embodiments, which modifications are meant to be 
covered by the spirit and scope of the appended claims. 



WHAT IS CLATMKn TS ; 

1 1. A method for compressing video data comprising the 

2 steps of: 

3 determining the degree of global motion between 

4 frames of video data; and 

5 adjusting the spacing between reference frames 

6 relative to the degree of global motion measured between 

7 frames . 

1 2. The method of Claim l, further including the steps 

2 of: 

3 establishing different threshold magnitudes or 

4 levels of motion between frames as representing different 

5 scene change types; and 

6 assigning different bit rates" to individual frames 

7 based upon said pre-established threshold levels of motion 

8 between frames. 

1 3. The method of Claim 1, further including the step 

2 of: 

3 assigning a bit rate for video coding each frame 

4 based upon the degree of global motion measured between 

5 frames . 



1 4. The method of Claim 2, further including the steps 

2 of: 



8 
9 
10 
11 



said threshold establishing step including the steps 
of designating a Type l scene change between a pair of 
successive frames as occurring whenever the measured motion 
therebetween exceeds a T, threshold representing a substantial 
7 scene or picture change; and 

designating, pursuant to a Type l scene change, the 
first occurring or prior frame of the pair as a P2 frame, and 
the second occurring or past frame as an 12 frame, each being 
of predetermined bit rates via said assigning step. 



1 5. The method of Claim 2, further including the steps 

2 of: 



said threshold establishing step including a step of 
designating a Type 0 scene change between frames occurring 
whenever the measured motion therebetween exceeds a To 
threshold representing substantial motion in a scene or 

7 picture; and 

8 detecting when the cumulative motion from an 
immediately preceding reference frame and a successive frame 
exceeds a To threshold, for designating the immediately prior 
frame to said successive frame as a Pi frame of predetermined 
bit rate via said assigning step, said Pi frame being a 

13 reference frame. 



9 
10 
11 
12 



1 6. The method of Claim 4, further including the steps 

2 of: 

3 said threshold establishing step including a step of 

4 designating a Type 0 scene change between frames occurring 

5 whenever the measured motion therebetween exceeds a Tq 

6 threshold representing substantial motion in a scene or 

7 picture ; and 

8 detecting when the cumulative motion from an 

9 immediately preceding reference frame and a successive frame 

10 exceeds a To threshold, for designating the immediately prior 

11 frame to said successive frame as a Pi frame of predetermined 

12 bit rate via said assigning step, said Pi frame being a 

13 reference frame. 

1 7. The method of Claim 6, further including the steps 

2 of: 

3 designating successive frames between reference 

4 frames as B frames, respectively. 

1 8. The method of Claim 6, further including the steps 

2 of : 

3 establishing a predetermined number of successive 

4 frames as a group of pictures (GOP) thereby grouping said 

5 successive frames into a plurality of successive GOP's; and 

6 designating the first frame of each of said 

7 plurality of GOP's as an II frame of predetermined bit rate 

8 via said assigning step. 



9 



Of : 



The method of Claim 8, further including the steps 



designating frames as Bl frames between reference 
frames in each of said plurality of GOP's where at least one 
Type 0 scene change has been detected, said Bl frames each 
having a predetermined bit rate via said assigning step; and 

designating frames as B2 frames between reference 
frames in each of said plurality of GOP's where no Type 0 
scene changes have been detected, said B2 frames each having a 
predetermined bit rate via said assigning step. 

10. The method of Claim 9, wherein said II and 12 frames 
are intra frames, said Pi and P2 frames, are predicted frames, 
and said Bl and B2 frames are bidirectionally interpolated 
frames. 

11- The method of Claim 9, wherein said assigning step 
further includes the steps of: 

assigning the relatively highest bit rate to li 
designated frames; 

assigning the second highest bit rate to Pi 
designated frames; 

assigning the third highest bit rate to B2 
designated frames; and 

assigning relatively lower bit rates to 12, P2, and 
Bl designated frames. 



12. 



The method of Claim li, wherein said assigning step 
includes assigning bit rates of 200JcB/sec. for Ii frames, 
lOOkB/sec. for PI frames, greater than lOkB/sec. for B2 
frames, and lOkB/sec. for Bl, 12, and P2 frames, 

respectively. 



13. The method of Claim 2, wherein said assigning step 
further assigns bit rates to said frames in a manner 
exploiting forward temporal masking in human vision, whereby 
at a scene change an immediately following frame is coarsely 



coded , 



14. The method of Claim 2. wherein said assigning step 
further assigns bit rates to said frames in a manner utilizing 
a backward temporal masking effect in a resulting coding 
scheme, whereby in instances where a scene change involves 
relatively large motion between two successive frames, the 
imnediately past frame at the scene change and the immediately 
following frame are coarsely coded via the assignment of a 
relatively low bit rate. 

15. The method of claim 3, further including the step 



of: 



assigning reference frames as intra (I) and 
predictive (P) based upon the degree of global motion measured 
between frames. 



16. The method of Claim 15, further including assigning 
bidirectional (B) interpolated frames between reference 
frames. 



17. The method of Claim 6, wherein the magnitude of said 
T, threshold is made about four times greater than the 
magnitude of said To threshold, whereby the T, threshold 
represents a complete scene or picture change between a pair 
of successive frames. 



18. The method of Claim 9, further including the step of 
gradually increasing to a full bit rate the bit rate of frames 
following a scene change to render the degradation of frames 
following a scene change imperceptible. 

19. The method of Claim 11, further including the step 
of using the designated positions of Pi and/or P2, and Bl, 
and/or 32 frames, for processing the frames through a motion 
compensated interpolation encoder, for encoding or compressing 
the associated video data. 



20, 



The method of Claim 19, further including the step 
of using a telescopic motion vector searching for determining 
motion between frames. 



1 21. The method of Claim 19, further including the steps 

2 of: 

3 sensing the bit rate being employed in said encoder 

4 in processing frames; and 

5 adjusting the coarseness of quantizing frames in 

6 said encoder in proportion to the deviation or error between 

7 an actual bit rate and a target bit rate. 

1 22. The method of Claim 21, further including the step 

2 of limiting the number of quantized Pi frames in high-motion 

3 segments between frames by replacing such Pi frames with Bl 

4 frames for substantially maximizing the quality of the 

5 resultant picture. 

1 23. The method of Claim 11, further including the step 

2 of: 

3 allocating the same number of bits from GOP to GOP 

4 for II, 12, PI, and P2 designated frames, respectively. 

1 24. The method of Claim 11, further including the step 

2 of: 

3 varying the number of bits allocated for II, 12, PI, 

4 P2, Bl, and B2 designated frames, respectively, from GOP to 

5 GOP in accordance with the number of detected Type 0 scene 

6 changes; and 

maintaining the ratio of bit allocations between II, 

8 12, PI, P2, Bl, and B2 designated frames constant from frame 

9 to frame within a GOP. 



25. The method of Claim 21, wherein said adjusting step 
is in accordance with the following equation: 



TB - XTB 



TBR^p 



where TB is target bit allocation, XTB is the target bit 
allocation for the previous frame, ABR«,p is the actual GOP bit 
rate, and TBRcop is the target GOP bit rate. 



of: 



26. The method of Claim 19, further including the step 



inserting N Pi frames into the structure of each GOP 
by default, for reducing encoding delays and distances between 
reference frames, where N is an integer number 1,2,3...; and 

processing" each GOP through said encoder via 
encoding the first frame through successive frames to the 
first PI default, and through the next successive frames to 
the next occurring Pi default, in an iterative manner until 
all frames have been encoded. 



27. The method of Claim 26, further including the step 

of: 

sizing each GOP to have an even number of frames 
whenever N is odd; and 

sizing each GOP to have an odd number of frames 
whenever N is even. 



28. The method of Claim 9, ftirther including the step of 
computing the bit allocations for B2 frames to follow the 
equation: 

\B2\ = \B1\ ^ l^^l - l^^l 

Where N is the number of |Pi| frames, M is the number of 
frames in each GOP, |bi| is the bit allocation for Bl frames, 
and I PI I is the bit allocation for PI frames. 



29. The method of Claim 24, further including the step 



of: 



inserting extra Pi frames into temporally busy 
regions, for producing a more constant perceptual picture 
quality. 

30. The method of Claim 29, wherein said inserting step 
further includes the step of satisfying the following 
equation: 



where k is the number of Pi designated frames, M is the GOP 
size, Rj,, and R,, are bit rate allocations for II, pi, and 

Bl frames, respectively, and R is the channel bit rate per 
second, whereby Ru, R,„ and R,, are adjusted to keep R at a 
desired constant value. 



31. The metJiod of Claim i, wherein said global motion 
determining step uses any one of five different distance 
measures for temporal segmentation including difference of 
histograms (DOH) , histograms of difference (HOD) , block 
histogram difference (BH) , block variance difference (BV) , and 
motion compensation error (MCE) . 

32. The method of Claim 2, wherein said assigning step 
includes the step of designating individual frames as intra 
frames (I) , and/or predicted frames (P) based upon the" 
magnitude of motion between associated frames exceeding a 
given threshold level. 



33. The method of Claim 6, further including the step 

of: 

optimally spacing said Pi designated frames for 
substantially optimizing the frame spacing between II, I2, Pi, 
and P2 reference frames. 



34. The method of Claim 33, wherein said optimal spacing 
step includes the steps of: 

minimizing the deviation from the mean of distances 
between frame positions initially designated PI through Type 0 
threshold detection, and paired frame positions designated 12 
and P2, respectively, via Type 1 threshold detection; and 

determining the structure for each GOP based upon 
the optimally spaced Pi frames. 



6 and P2, respectively, via Type 1 threshold detection; and 

7 determining the structure for each GOP based upon 

8 the optimally spaced PI frames. 

1 35. The method of Claim 34, wherein said optimal spacing 

2 step includes the steps of exhausitively searching each frame 

3 for determining motion vectors. 

1 36. The method of Claim 34, wherein said optimal spacing 

2 step includes the steps of using a bacJcward telescopic search 

3 for determining motion vector information between frames. 

1 37. The method of Claim li, further including the steps 

2 of: 

3 varying the bit allocations for each II, 12, Pi, P2, 

4 and Bl frame types from GOP to GOP; and 

5 retaining the bit rate constant for each of said 

6 frame types. 

1 38. The method of Claim 37, further including in said 

2 bit allocation varying step the step of assigning target bit 

3 allocations for each GOP in accordance with the following 

4 formula: 



= C, — « 

2(C„ + NCp, + (M - N - DC^,) 

5 where D, is the target bit allocation for picture Type t, t 

6 being either II, I2, Pi, P2, or Bl; C„, Cp,, and C^. 



39. The method of Claim 33,. wherein said optimally 
spacing step includes the step of conducting a binary search 
between frames to find substantially equidistant positions for 
Type 0 threshold change Pi designated frames. 

40. The method of Claim l, further including the step of 
forming spatial subbands from the video data representing said 
frames, followed by said global motion determining step for 
determining the degree of global motion using selected spatial 
subbands of successive frames. 

41. The method of Claim 40, wherein said spatial subband 
forming step includes the steps of: 

forming derived images comprised of the lowpass or 
highpass spatial frequency components for one spatial 
direction and the lowpass or highpass spatial frequency 
components for the second spatial direction of the video data 
of each frame; and 

subsampling said derived images for each frame in 
both spatial directions for obtaining said spatial subbands. 



42. The method of Claim 40,. wherein said spatial subband 
forming step includes the steps of: 

passing the video data of each frame in succession 
through a lowpass filter and highpass filter, respectively, 
for obtaining two filtered images in each spatial direction; 
and 

subsampling by a factor of two in each spatial 
direction, for sampling one filtered image thereof for each 
frame, respectively, wherein each resulting subimage has one- 
quarter as many pixels as an original image from which it was 
derived. 

43. The method of Claim 40, wherein said spatial subband 
forming step includes the steps of: 

passing the video data of each frame in succession 
through first and second stages of both lowpass and highpass 
filters, respectively, for obtaining second stage subband 
images comprised of the lowpass or highpass frequency 
components for one spatial direction and the lowpass or 
highpass spatial frequency components for the second spatial 
direction of the lowest first stage subband; 

subsampling said second stage subband images in each 
spatial direction for each frame, respectively, wherein each 
resulting subimage has one-sixteenth the number of pixels as 
an original image from which it was derived; and 

performing said global motion determining step using 
the lowest subband of said second stage subband images. 



44. The method of Claim 40, further including the steps 

of: 

establishing a predetermined nvimber of successive 
frames as a group of pictures (GOP) , for grouping successive 
frames into a plurality of successive GOP's; 

establishing predetermined thresholds of global 
motion between frames, for designating I and P type reference 
frames, with all other frames therebetween being B type 
frames,- and 

varying the target bit allocation for each frame 
type adaptive to a changing scene complexity from one frame 
type to the next successive frame of said one frame type. 

45. In a video data compression system employing i, p, 
and B frames in accordance with the MPEG standard, the method 
comprising the steps of: 

predetermining thresholds of temporal activity 
between frames of groups of pictures (GOP) , for designating I 
and P frames; and 

. designating B frames for frames located between any 

one of a pair of I and P frames, i frames, or P frames, 
respectively. 



46. A system for compressing video data comprising video 
data associated with groups of pictures (GOP) including a 
predetermined number of frames, said system comprising: 

motion detection means for determining the degree of 
global motion between said frames; 



means responsive to said global motion measurements 
from said motion detection means for designating and adjusting 
the spacing between reference frames; and 

encoder means for encoding said reference frames. 

47. The system of Claim 46, wherein said designating 
means includes means for coding said reference frames as I 
and/or P, and B frames relative to global motion between 
frames . 

48. The system of Claim 46, wherein said motion 
detection means includes: 

Type 0 scene change detector means for detecting 
when the cumulative motion from an immediately preceding 
reference frame, and a successive frame exceeds a 
predetermined To threshold, whereby said designating means 
responds by designating the immediately prior frame to said 
successive frame as a Pi frame. 

49. The system of Claim 46, wherein said motion 
detection means includes: 

Type 1 scene change detector means for detecting 
when the global motion between two successive frames exceeds a 
predetermined Tj threshold representing a substantial scene or 
picture change, whereby said designating means responds by 
designating the first occurring of the two successive frames 
as a P2 frame, and the other or second occurring of the two 
successive frames as an 12 frame. 



1 u b 

50. The system of Claim 48, wherein said motion 
detection means further includes: 

Type 1 scene change detector means for detecting 
when the global motion between two successive frames exceeds 
predetermined T, threshold representing a substantial scene c 
picture change, whereby said designating means responds by 
designating the first occurring of the two successive frames 
as a P2 frame, and the other or second occurring of the two 
successive frames as an 12 frame. 



51. The system of Claim 50, wherein said frames are • 
arranged in groups of pictures (GOP) each consisting of a 
predetermined number of successive frames, and said encoder 
means further includes: 

bit rate controller means for insuring that the 
number of bits being utilized in encoding a given GOP do not 
exceed the bit capability of said system. 

52. Apparatus for compressing video data contained in a 
group of frames including a reference frame comprising: 

means for determining the global motion between 

frames ; 

means for classifying the types of frames in the 
group in response to the global motion so determined; 

and a motion compensation encoder for processing 
frames in accordance with their classification. 



53. Apparatus as set forth in Claim 52, wherein said 
means for classifying classifies a first of adjacent frames as 
a P2 frame and the later of said adjacent frames as an 12 
frame if the global motion between them exceeds a given value 
indicating a scene change, 

54. Apparatus as set forth in Claim 52, wherein a frame 
prior to a frame having global motion with respect to a 
previous reference frame in excess of a given value is 
designated as a PI frame. 

55. Apparatus as set forth in Claim 52, further 
comprising means for classifying N frames in said group as Pi 
frames by default. 

56. Apparatus as set forth in Claim 55, wherein the N 
frames classified as Pi frames occur at given frame intervals. 

57. Apparatus as set forth in Claim 55, wherein adjacent 
frames classified as PI frames by default have the same amount 
of global motion between them. 

58. A system for compressing video data comprising video 
data associated with groups of pictures (GOP) including a 
predetermined number of frames, said system comprising: 

subband video coding means for receiving said video 
data, and extracting a plurality of spatial subbands from said 
video data, said spatial subbands as a whole representing 



subsampled pixels of individual frames, respectively; 

motion detection means for determining the degree of 
global motion between corresponding subsampled pixels of said 
spatial subbands of said frames, respectively; 

means responsive to said global motion measurements 
from said motion detection means for designating and adjusting 
the spacing between reference frames; and 

subband encoder means for encoding said spatial 
subbands of said reference frames, respectively. 

59. The system of claim 58, wherein said designating 
means includes means for coding said reference frames as I 
and/or P, and B frames relative to global motion between said 
spatial subbands of different ones of said frames, 
respectively. 



60. The system of Claim 58, wherein said motion 
detection means includes: 

Type 0 scene change detector means for detecting 
when the cumulative motion from spatial subbands of an 
immediately preceding reference frame, and corresponding 
spatial subbands of a successive frame exceeds a predetermined 
To threshold, whereby said designating means responds by 
designating the immediately prior frame to said successive 
frame as a Pi frame. 



61. The system of Claim 58, wherein said motion 
detection means includes: 

Type 1 scene change detector means for detecting 
when the global motion between corresponding spatial subbands 
of two successive frames exceeds a predetermined T, threshold 
representing a substantial scene or picture change, whereby 
said designating means responds by designating the first 
occurring of the two successive frames as a P2 frame, and the 
other or second occurring of the two successive frames as an 
12 frame. 



62. The system of Claim 60, wherein said motion 
detection means further" includes: 

Type 1 scene change detector means for detecting 
when the global motion between corresponding spatial subbands 
of two successive frames exceeds a predetermined T, threshold 
representing a substantial scene or picture change, whereby 
said designating means responds by designating the first 
occurring of the two successive frames as a P2 frame, and the 
other or second occurring of the two successive frames as an 
12 frame. 



1 1 0 

63. The system of Claim 62, wherein said frames are 
arranged in groups of pictures (GOP) each consisting of a 
predetermined number of successive frames, and said encoder 
means further includes: 

bit rate controller means for insuring that the 
number of bits being utilized in encoding the spatial subbands 
of a given GOP do not exceed the bit capability of said 
system. 
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LPN : last picture number 

CPICI : Actual picture number 
corresponding to the fint frame of a GOP 

i : index for a scene segment 

SCNUM : the number of scene segments 

II : Intra-coded picture of type 1 

SCDET : scene change detector 

MAIN : main encoding algorithm 



Fig. 32 



SCDET 



170- 
171- 

172- 



Next frame 
(c = c+l) 



Load GOP frames 



pcount = 0 
sdndex = 0 



sct(scindcx] = 2 
PNSCF[scmdexl=0 
scindex=scindex-i-I 




scindex : index for scene segment 
sct[.): scene change type 
PNSCFI.]: picture number of the 1st frame 
of a scene segment 

PNSCL[.]: picture number of the last frame 
of a scene segment 



Yes 



179^ 

Yes 



sctfscindex] = I 
PNSCF[scindex] = c 
PNSCL[scindex-l)=c-l 
scindex=scindex-h 1 





sctfscindex] = 0 


; 180 


PNSCF[scindcx] = c 


PNSCL(scindex-l]=c-l 




pcount=pcount+l 



183- 



scindex=scindex-t- 1 



'ref 



PNSCL(scindcx- 1 ]=GOP 
SCNUM=scindex 



184 



178 



Fig. 33 



c : oment frame number 

^rcf : previous reference 
frame 

pcount: count of PI frames 
: threshold for type 1 
scene change detection 
X : threshold for type 0 

scene change deiccrion 
D( * , * )• distance measure 

Cond. A : sctfscindex- 1)= I & 
(PNSCL[scindex-l ].PNSCF[scindex- 1 ])=0 



N-PTAMISCDET 



171 



N: the number of default P frames 
DP: the count of previous default 
Pframcs 

A mod B: remainder of the division, 
A/B 



Load GOP frames 



DP = 0, 
pcount = C 
Scmaex = 



sct[scindex] = 2 
PNSCFIscmdcx)=0 
scindex=:scindex4- 1 



-170 



172 




scindex : index for scene segment 
sct[.]: scene change type 
PNSCF[.): picture number of the 1st frame 
of a scene segment 

PNSCL(.): picture number of the last frame 

of a scene segment 
c : current frame number 



177 



Next frame 
(c = c^-n 



sctfscindex] = 1 
PNSCFfscindexl = c 
PNSCL[scmdex-l]=c-l 
scmdex=sdndcx+l 



178 



200 



sct(scindex) = 0 

PNSCFlscindexl = c 
PNSCL(scindex-ll=:c-l 




183. 



$cindex=scindcx+l 





sct{scindexl = 2 
PNSCF[scindex]=c 
PNSCL(scindcx-ll=c 
scindex=scindex+l 



rtf 



f. 



/ 



204 



205 



*t : threshold for type 0 
scene change detection 



PNSCL(scindex-lI=GOP 
SCNUM=scindex 



D( 



): distance measure 



T 



"Y 



184 



^ttf : previous reference 
frame 

pcount: count of PI frames 

TJ : threshold for type 1 
scene change detection 

Cond. A : sci{scindcx-l]=l & 
(PNSCLlscindex- 1 1-PNSCFlscindex- 1 1)==0 



Fig. 34 



IUI2,PKP2,B1,B2 Coding 



I 



DCT 



QUANT 



VLC 



IQUANT 



IDCT 



SAVE 



250 



Adapt quantization step size 



251 
252 



Buffer Control 



Output 



253 



254 



255 



256 



Fig. 35 



DCT : discrete cosine transfonn 
VLC : variable length coding 
IQUANT : inverse quantization 
IDCT : inverse DCT 

SAVE : save the decoded result for later motion compensation 
QUANT : Quantizer having step sizes: 



Default: QS 
II :QS 
Pi : QS 
Bl :2Qs 



12 : lOQS 
P2 : 3QS 
B2 : 2QS 




Fig. 36A 



MAIN 




NPl : count of past PI frames coding : subroutine for PI frame 

PN : current picture number coding : subroutine for P2 frame 

GOP : group of picture size, usuaUy 15 ^ ^ coding : subroutine for B 1 frame 

11 coding : subroutine for II frame ^2 coding : subroutine for B2 frame 

12 coding : subroutine for 12 fi^e 

MEI : Telescopic motion estimation for alll frames between [PNSCF[i], PNSCL[i]-l] 
MEP : Telescopic motion estimation for all frames between [PNSCF[i], PNSCLp]] 
Prediction : Generate prediction image for P frame coding using forward motion vectors 
estimated in MEP. 

Interpolation : Generate interpolation image which is an average image of forward and 

backward predicdon images using forward and backward monon vectors 
estimated in MEP or MEI. 



Fig. 36B 



MEP 




FN = PNSCL[i]-l 



MEB(FNJ>NSCL[i]) 



I 



FN = FN- 1 



294 



295 



296 




FN : current frame number 



MEF(PNSCF[i], FN) : forward motion vector search between PNSCF[i] and FN 



MEB(FN, PNSCLp]) : backward motion vector search between FN and PNSCL[il 



Fig. 37 



MEI 




FN : current frame number 

MEF(PNSCF[i], FN) : forward motion vector search between PNSCF[i] and FN 
MEB(FN. PNSCL[i]) : backward motion vector search between FN and PNSCL[i] 

Fig. 38 
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